“Its my instance and I’ll Block if I want to, Block if I want to…”

Right now, updating the Iceshrimp Blocklist is a manual task; these scripts automate that process.

Note:
I am well aware of the controversy and debate around Blocklist(s) but the fact remains that they exist and people use them. This is a post about how to automate the currently manual process.

Like many Instance owners and Administrators, I employ Blocklists.

These Blocklists need regular updating, mainly because a Domain may be on a Blocklist one week and removed from that Blocklist later on.

When using Iceshrimp, this process is a manual one:

  • Find a suitable blocklist
  • Download the list (usually a .csv file)
  • Extract the column of Domains
  • Copy that column to the clipboard
  • Log onto Iceshrimp as an Administrator
  • Go to ‘Control Panel | Federation Management’
  • Delete any old Domain names in the text box
  • Paste in the clipboard contents

Clearly this is less than ideal.

Updating the Iceshrimp Blocklist Python Scripts

This first script replaces that manual task by; collecting a single online list, doing the data extraction, and then inserting the Domain info into the PostgreSQL database using Python.

update_blocklist_single.py

This requires the following:

  • The remote URL link to a .csv Blocklist file
  • A local path setting
  • The local filename for the .csv file (this can be the same as the remote name)
  • Finally, a local filename for the log file
#!/usr/bin/env python
# -*- coding: utf-8 -*-

# The url library
import urllib.request

# The pandas library (pip install pandas)
import pandas as pd

# The PostgreSQL library (pip install psycopg2)
import psycopg2

# Local path and filenames
blocklist_path = "/home/foo/path/to/update_iceshrimp_blocklist/"
blocklist_filename = "blocklist_single.csv"
blocklist_log_path = "/home/foo/Logs/"
blocklist_log_filename = "iceshrimp_blocklist_single_updated"

# Link to the Blocklist to be used
url = ("https://codeberg.org/oliphant/blocklists/raw/branch/main/blocklists/_unified_tier0_blocklist.csv")

# Go grab the Blocklist
urllib.request.urlretrieve(url, blocklist_path+blocklist_filename)

# Read the CSV file into a pandas DataFrame
df = pd.read_csv(blocklist_path+blocklist_filename)

# Select the first column (column 0) and convert it to a text Series
blocklist_series = df.iloc[:, 0].astype(str)

# Join the text Series into a single string with a comma
blocklist_string = ','.join(blocklist_series)

# Establish a Postgres DB connection
conn = psycopg2.connect(host="127.0.0.1", database="iceshrimp_db", user="iceshrimp_user", password="iceshrimp_db_user_password", port= '5432')

# Create a cursor object using the cursor() method
cursor = conn.cursor()

# Prepare the SQL query to UPDATE the 'blockedHosts' field in the 'meta' table in the database
table_modification = """UPDATE meta SET "blockedHosts" = '{""" + blocklist_string + """}' WHERE id = 'x'"""

# Execute and error-trap
try:
    # Execute the SQL command
    cursor.execute(table_modification)
    print("Executed and UPDATED")

    # Commit the changes to the database
    conn.commit()
    print("UPDATE Committed")

    # Create a log file to show that the UPDATE has successfully completed
    open(blocklist_log_path+blocklist_log_filename, 'w')
    print("UPDATE log file created")

except:
    # Roll back in case of error
    conn.rollback()
    print("UPDATE Failed and rolled back")

# Close the connection
conn.close()
print("Connection CLOSED")

This second script replaces that manual task by; collecting multiple online lists, doing the data extraction, collating them, removing any duplicates and then inserting the Domain info into the PostgreSQL database using Python.

update_blocklist_multiple.py

Note:
This process will work just as well with one entry in the list as it will with multiple entries.

This requires the following:

  • A ‘ blocklist_multiple.txt ‘ file that contains URLs to the remote .csv files
  • An optional .csv file containing URLS to be blocked; a ‘ personal_blocklist.csv
  • A local path setting
  • Finally, a local filename for the log file
#!/usr/bin/env python
# -*- coding: utf-8 -*-

# The url library
import urllib.request

# The pandas library (pip install pandas)
import pandas as pd

# The PostgreSQL library (pip install psycopg2)
import psycopg2

# Local path and filenames
blocklist_path = "/home/foo/path/to/update_iceshrimp_blocklist/"
blocklist_filename = "blocklist_multiple.txt"
blocklist_log_path = "/home/foo/Logs/"
blocklist_log_filename = "iceshrimp_blocklist_multiple_updated"

# Python pathname pattern matching
import glob

# Download all of the *.csv files in 'blocklist.txt' into the current directory
with open(blocklist_path+blocklist_filename) as f:
    for line in f:
        url = line
        blocklist_filename = url.split('/', -1)[-1]
        urllib.request.urlretrieve(url, blocklist_path+blocklist_filename.rstrip('\n'))

# Put all of the *.csv files in the current directory into a single DataFrame
files = glob.glob(blocklist_path+"*.csv")
content = []

# Loop through all the *.csv files
for filename in files:
    df = pd.read_csv(filename, index_col=None)
    print(len(df.index), "rows in", filename)
    content.append(df)

# Convert 'content' to a DataFrame
df = pd.concat(content)

# Print the # of DataFrame rows
print(len(df.index), "rows in the DataFrame")

# Select the first column (column 0) and convert it to a text Series
blocklist_series = df.iloc[:, 0].astype(str)

# Make the Series unique
blocklist_series = pd.unique(blocklist_series)

# Print the # of rows in the Series
print(pd.Series(blocklist_series).count(), "unique rows in the Series")

# Join the text Series into a single string with a comma
blocklist_string = ','.join(blocklist_series)

# Establish a Postgres DB connection
conn = psycopg2.connect(host="127.0.0.1", database="iceshrimp_db", user="iceshrimp_user", password="iceshrimp_db_user_password", port= '5432')

# Create a cursor object using the cursor() method
cursor = conn.cursor()

# Prepare the SQL query to UPDATE the 'blockedHosts' field in the 'meta' table in the database
table_modification = """UPDATE meta SET "blockedHosts" = '{""" + blocklist_string + """}' WHERE id = 'x'"""

# Execute and error-trap
try:
    # Execute the SQL command
    cursor.execute(table_modification)
    print("Executed and UPDATED")

    # Commit the changes to the database
    conn.commit()
    print("UPDATE Committed")

    # Create a log file to show that the UPDATE has successfully completed
    open(blocklist_log_path+blocklist_log_filename, 'w')
    print("UPDATE log file created")

except:
    # Roll back in case of error
    conn.rollback()
    print("UPDATE Failed and rolled back")

# Close the connection
conn.close()
print("Connection CLOSED")

blocklist_multiple.txt

This is the list of URLs that link to the .csv Blocklist(s).

https://codeberg.org/oliphant/blocklists/raw/branch/main/blocklists/_unified_tier0_blocklist.csv
https://seirdy.one/pb/pleroma.envs.net.csv

personal_blocklist.csv

Additionally, a ‘personal_blocklist’ can be created (this file can be named what you like as long as it is a .csv file).

naughty_fedi.com

Updating the Iceshrimp Blocklist crontab

In addition to running the script manually, either script can be executed nightly, at midnight, from a normal crontab (crontab -e from a Terminal).

# Update the iceshrimp blocklist daily at midnight
#0 0 * * * python3 /home/foo/path/to/script/update_blocklist_single.py
0 0 * * * python3 /home/foo/path/to/script/update_blocklist_multiple.py
#

Finally

(I did mention this in a prior post but it bears repeating.)

For anyone interested in getting into the SQL side, pgAdmin is a useful tool.

I installed it using:

curl -fsS https://www.pgadmin.org/static/packages_pgadmin_org.pub | sudo gpg --dearmor -o /usr/share/keyrings/packages-pgadmin-org.gpg

sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/packages-pgadmin-org.gpg] https://ftp.postgresql.org/pub/pgadmin/pgadmin4/apt/jammy pgadmin4 main" > /etc/apt/sources.list.d/pgadmin4.list && apt update'

sudo apt update

sudo apt install pgadmin4

A screenshot of the Linux application pgAdmin, used to administer PostgreSQL databases. The Calckey Fediverse Server uses PostgreSQL as its back end darabase engine. pgAdmin Screenshot

However you Fediverse, Enjoy!

UPDATE

Iceshrimp is moving to a .NET framework.

Additional scripts for updating a blocklist when running Iceshrimp.NET can be found here .