Why?
I’ve seen a few people lately eager to start self-hosting something and are unsure how the jigsaw pieces fit together.
This is not intended to be a ‘how do I configure blah blah?’ piece; think of it more like the picture on the front of the jigsaw box.
Hopefully this will go some way to deciphering some of the technical terms and make starting the self-hosting puzzle a bit easier.
Terminology
I’ll try and keep this bit brief but unfortunately invariably there are acronyms and a bit of technical language ahead.
What are ‘Packets’?
Essentially someone using a web browser ‘requests’ a web page.
This web page is then ‘served’ by a web server back to the web browser that requested it.
The web page is broken into chunks, or ‘packets’ and sent back to the web browser that requested it.
It is these packets that flow through the pipes that connect the internet together.
Packets know a couple of things, most importantly:
-
where they came from (the web browser that requested the content)
-
where they are going to (the web server where the content is hosted)
Aside: Some packets know about the the packets that are ahead and behind them but that’s not really relevant to cover here.
The following will hopefully describe the hoops that the packets have to jump through.
What is ‘DNS’?
So, you want to look at ‘https://google.com/’ ?
The first thing to do is turn the human readable bit ‘google.com’ into an IP address .
An IP address is a unique identifying number assigned to every device connected to the internet (including your router).
In this case the IP address of ‘https://google.com/’ is ‘142.250.187.238’.
(You can check this by putting ‘142.250.187.238’ in a browser’s address bar.)
This is called Domain Name Resolution or DNS and normally the ISP of whoever is requesting the web page does this.
Certbot
Back in the olden days, having an ‘S’ at the end of the HTTP bit of your web address was fancy and reserved for sites doing financial stuff.
It was also expensive.
You had to buy a Certificate from a Certificate Authority, and then renew it annually.
Certbot is run by the Electronic Frontier Foundation ( EFF ) and gives Certs away for free.
(It also handles the renewal process, so once you have grabbed a Cert for your site, that’s it, that’s everything you need to do).
The Request
All of the above is lovely to know but what about when someone using a browser wants to look at your website.
The first thing you have to do is buy a Domain and set the IP address of that Domain as your router’s IP address with the Domain registrar.
This is called setting the ‘A record’.
‘A records’, known as Address records are used to store IP address information for a Domain name.
Now, their browser request will resolve your Domain name to your router’s IP address.
Their browser will send a packet requesting the content from your website.
This packet’s first stop will be your router.
The Router
So, the packet has now reached your router.
Web traffic happens on Port 80 (HTTP) and Port 443 (HTTPS).
Those Ports need to be opened on the router, and the packet traffic directed to the IP address of your web server.
This will be the internal network IP address of the machine where your web server lives, probably 192.168.0.something.
The Firewall
So, now the packet has traversed the router and been directed towards the web server.
First it has to get through the firewall (you are running a firewall aren’t you?).
If it’s a Debian or Ubuntu server it’s probably using Uncomplicated Firewall or UFW.
Again, Ports 80 and 443 need to be opened on the firewall to allow traffic to get to and from the web server.
(No-one really uses HTTP on Port 80 anymore but requests may come in via that Port, which your web server will turn into a HTTPS request.)
If you want to connect to your server using the Secure Shell (SSH) Protocol, Port 22 will need to be opened on the firewall too.
If SSH traffic has been allowed through both the router and the firewall, you will need to look into installing Fail2Ban ; this will limit the exposure of Port 22 to the wider Internet.
More complex applications (e.g. Fediverse software) may require different Ports to be opened.
This is beyond the scope of this piece.
The Web Server
So, the packet has now traversed the router, been allowed through the firewall and has finally gotten to the web server.
Where does it go now?
What if there are multiple websites on the web server?
Well, the web server directs traffic packets to the appropriate folder where the content lives.
Say you have two sites ‘FantasticWebsite.com’ and ‘EvenMoreFantasticWebsite.com’.
There will be two web server folders with content:
/var/www/FantasticWebsite.com
And
/var/www/EvenMoreFantasticWebsite.com
The web server sends the requesting packets to the corresponding folder.
Remember how I said right at the beginning that packets know two things; ‘where they came from’ and ‘where they are going to’?
Well now that they have reached your web server the ‘where they are going to’ bit has been completed.
Your web server now does the ‘where they came from’ bit and sends the content back to the requesting web browser.
Don’t Run Before You Can Walk
If I were thinking of starting to self-host stuff, I would start small, something like:
Buy a Domain, say, ‘MyNewWebsite.com’.
Point the Domain’s Address record to my router’s IP address.
Figure out how to log into the router to open Ports 80 and 443.
Install a web server, say, nginx
.
Create a directory at /var/www/MyNewWebsite.com
Create a simple ‘index.html` file to go in the above folder, something like:
<!DOCTYPE html>
<html>
<head>
<title>My First Webpage</title>
</head>
<body>
<h1>My First Webpage</h1>
</body>
</html>
Run certbot
to grab a HTTPS certificate.
Use a web browser to look at ‘MyNewWebsite.com’ to make sure that it all works.
OK, Now You Can Run!
Now that you understand how a packet travels from a distant web browser, across the Internet, through your router and firewall, is handled by your web server and sends a response, you’ll probably want to install some more interesting services.
This is a great place to spend some time browsing general self-hosting applications:
https://awesome-selfhosted.net/
For an ActivityPub (Fediverse) oriented list see here:
https://github.com/sguzman/delightful-fediverse-apps
Shameless Self Promotion
For a music streaming server, Navidrome is an easy one to recommend.
For Fediverse Socials, GoToSocial has excellent documentation and is fairly uncomplicated to setup (there is a bit of tinkering with config files, nothing scarier than that).
Both of the above are written in the Go
language and unlike a traditional Linux installation, (which requires the installation of other files called Dependencies) they are self contained units.
You download the relevant Go
binary, put it in a folder and run it.
This approach has both positives and negatives.
Positive: If the binary runs and works, well, that’s it, forget about it, that’s all you have to do.
Negative: You’ll never know when an update is available and if you care about updates will have to keep an eye on their GitHub page or Socials.
Obviously there are a whole host of other services you can self-host; spend some time browsing and reading about what other people are doing.
Tech Stacks
Finally, some advice you can choose to disregard.
All-in-One, point and click packages have been around for ages.
Back in the ‘90’s LAMP (Linux, Apache, MySQL and PHP) were all the rage.
I avoided them, preferring to stack the pieces myself.
I feel you gain a better understanding of how everything sits together and interacts if you have installed it yourself.
(You are also far more likely to find online help if you are running a vanilla installation, rather than one that has been fucked about with in unknown ways.)
Anyway, YMMV, it’s your journey; you do you!
More Reading
https://awesome-selfhosted.net/
https://github.com/sguzman/delightful-fediverse-apps
https://www.digitalocean.com/community/conceptual-articles/introduction-to-web-servers