“Its my instance and I’ll Block if I want to, Block if I want to…”
Right now, updating the Iceshrimp Blocklist is a manual task; these scripts automate that process.
Note:
I am well aware of the controversy and debate around Blocklist(s) but the fact remains that they exist and people use them. This is a post about how to automate the currently manual process.
Like many Instance owners and Administrators, I employ Blocklists.
These Blocklists need regular updating, mainly because a Domain may be on a Blocklist one week and removed from that Blocklist later on.
When using Iceshrimp, this process is a manual one:
- Find a suitable blocklist
- Download the list (usually a .csv file)
- Extract the column of Domains
- Copy that column to the clipboard
- Log onto Iceshrimp as an Administrator
- Go to ‘Control Panel | Federation Management’
- Delete any old Domain names in the text box
- Paste in the clipboard contents
Clearly this is less than ideal.
Updating the Iceshrimp Blocklist Python Scripts
This first script replaces that manual task by; collecting a single online list, doing the data extraction, and then inserting the Domain info into the PostgreSQL database using Python.
update_blocklist_single.py
This requires the following:
- The remote URL link to a .csv Blocklist file
- A local path setting
- The local filename for the .csv file (this can be the same as the remote name)
- Finally, a local filename for the log file
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# The url library
import urllib.request
# The pandas library (pip install pandas)
import pandas as pd
# The PostgreSQL library (pip install psycopg2)
import psycopg2
# Local path and filenames
blocklist_path = "/home/foo/path/to/update_iceshrimp_blocklist/"
blocklist_filename = "blocklist_single.csv"
blocklist_log_path = "/home/foo/Logs/"
blocklist_log_filename = "iceshrimp_blocklist_single_updated"
# Link to the Blocklist to be used
url = ("https://codeberg.org/oliphant/blocklists/raw/branch/main/blocklists/_unified_tier0_blocklist.csv")
# Go grab the Blocklist
urllib.request.urlretrieve(url, blocklist_path+blocklist_filename)
# Read the CSV file into a pandas DataFrame
df = pd.read_csv(blocklist_path+blocklist_filename)
# Select the first column (column 0) and convert it to a text Series
blocklist_series = df.iloc[:, 0].astype(str)
# Join the text Series into a single string with a comma
blocklist_string = ','.join(blocklist_series)
# Establish a Postgres DB connection
conn = psycopg2.connect(host="127.0.0.1", database="iceshrimp_db", user="iceshrimp_user", password="iceshrimp_db_user_password", port= '5432')
# Create a cursor object using the cursor() method
cursor = conn.cursor()
# Prepare the SQL query to UPDATE the 'blockedHosts' field in the 'meta' table in the database
table_modification = """UPDATE meta SET "blockedHosts" = '{""" + blocklist_string + """}' WHERE id = 'x'"""
# Execute and error-trap
try:
# Execute the SQL command
cursor.execute(table_modification)
print("Executed and UPDATED")
# Commit the changes to the database
conn.commit()
print("UPDATE Committed")
# Create a log file to show that the UPDATE has successfully completed
open(blocklist_log_path+blocklist_log_filename, 'w')
print("UPDATE log file created")
except:
# Roll back in case of error
conn.rollback()
print("UPDATE Failed and rolled back")
# Close the connection
conn.close()
print("Connection CLOSED")
This second script replaces that manual task by; collecting multiple online lists, doing the data extraction, collating them, removing any duplicates and then inserting the Domain info into the PostgreSQL database using Python.
update_blocklist_multiple.py
Note:
This process will work just as well with one entry in the list as it will with multiple entries.
This requires the following:
- A ‘ blocklist_multiple.txt ‘ file that contains URLs to the remote .csv files
- An optional .csv file containing URLS to be blocked; a ‘ personal_blocklist.csv ‘
- A local path setting
- Finally, a local filename for the log file
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# The url library
import urllib.request
# The pandas library (pip install pandas)
import pandas as pd
# The PostgreSQL library (pip install psycopg2)
import psycopg2
# Local path and filenames
blocklist_path = "/home/foo/path/to/update_iceshrimp_blocklist/"
blocklist_filename = "blocklist_multiple.txt"
blocklist_log_path = "/home/foo/Logs/"
blocklist_log_filename = "iceshrimp_blocklist_multiple_updated"
# Python pathname pattern matching
import glob
# Download all of the *.csv files in 'blocklist.txt' into the current directory
with open(blocklist_path+blocklist_filename) as f:
for line in f:
url = line
blocklist_filename = url.split('/', -1)[-1]
urllib.request.urlretrieve(url, blocklist_path+blocklist_filename.rstrip('\n'))
# Put all of the *.csv files in the current directory into a single DataFrame
files = glob.glob(blocklist_path+"*.csv")
content = []
# Loop through all the *.csv files
for filename in files:
df = pd.read_csv(filename, index_col=None)
print(len(df.index), "rows in", filename)
content.append(df)
# Convert 'content' to a DataFrame
df = pd.concat(content)
# Print the # of DataFrame rows
print(len(df.index), "rows in the DataFrame")
# Select the first column (column 0) and convert it to a text Series
blocklist_series = df.iloc[:, 0].astype(str)
# Make the Series unique
blocklist_series = pd.unique(blocklist_series)
# Print the # of rows in the Series
print(pd.Series(blocklist_series).count(), "unique rows in the Series")
# Join the text Series into a single string with a comma
blocklist_string = ','.join(blocklist_series)
# Establish a Postgres DB connection
conn = psycopg2.connect(host="127.0.0.1", database="iceshrimp_db", user="iceshrimp_user", password="iceshrimp_db_user_password", port= '5432')
# Create a cursor object using the cursor() method
cursor = conn.cursor()
# Prepare the SQL query to UPDATE the 'blockedHosts' field in the 'meta' table in the database
table_modification = """UPDATE meta SET "blockedHosts" = '{""" + blocklist_string + """}' WHERE id = 'x'"""
# Execute and error-trap
try:
# Execute the SQL command
cursor.execute(table_modification)
print("Executed and UPDATED")
# Commit the changes to the database
conn.commit()
print("UPDATE Committed")
# Create a log file to show that the UPDATE has successfully completed
open(blocklist_log_path+blocklist_log_filename, 'w')
print("UPDATE log file created")
except:
# Roll back in case of error
conn.rollback()
print("UPDATE Failed and rolled back")
# Close the connection
conn.close()
print("Connection CLOSED")
blocklist_multiple.txt
This is the list of URLs that link to the .csv Blocklist(s).
https://codeberg.org/oliphant/blocklists/raw/branch/main/blocklists/_unified_tier0_blocklist.csv
https://seirdy.one/pb/pleroma.envs.net.csv
personal_blocklist.csv
Additionally, a ‘personal_blocklist’ can be created (this file can be named what you like as long as it is a .csv file).
naughty_fedi.com
Updating the Iceshrimp Blocklist crontab
In addition to running the script manually, either script can be executed nightly, at midnight, from a normal crontab (crontab -e
from a Terminal).
# Update the iceshrimp blocklist daily at midnight
#0 0 * * * python3 /home/foo/path/to/script/update_blocklist_single.py
0 0 * * * python3 /home/foo/path/to/script/update_blocklist_multiple.py
#
Finally
(I did mention this in a prior post but it bears repeating.)
For anyone interested in getting into the SQL side, pgAdmin is a useful tool.
I installed it using:
curl -fsS https://www.pgadmin.org/static/packages_pgadmin_org.pub | sudo gpg --dearmor -o /usr/share/keyrings/packages-pgadmin-org.gpg
sudo sh -c 'echo "deb [signed-by=/usr/share/keyrings/packages-pgadmin-org.gpg] https://ftp.postgresql.org/pub/pgadmin/pgadmin4/apt/jammy pgadmin4 main" > /etc/apt/sources.list.d/pgadmin4.list && apt update'
sudo apt update
sudo apt install pgadmin4
pgAdmin Screenshot
However you Fediverse, Enjoy!
UPDATE
Iceshrimp is moving to a .NET framework.
Additional scripts for updating a blocklist when running Iceshrimp.NET can be found here .