I love graphs.
I wanted to add some traffic analysis to my Pi based Apache web server and had a couple of prerequisites; real time but good looking; enter GoAccess .
I’m already using the old school AWStats installed and configured but it’s not real-time and I wanted something funky looking.
However, none of the WordPress plugins I was looking into were doing it for me.
Most of them either send a cookie (which I am against doing, as I want visitors to have confidence that the site has zero ads) or were really basic.
What is it? GoAccess is an open source real-time web log analyzer and interactive viewer that runs in a terminal in *nix systems or through your browser.
Sounds straightforward ?
Well, it kinda was.
I used my dynamic DNS service to forward a new domain name to my routers IP.
(I could have attached the traffic reports to the existing sites but I’m thinking ahead and planning a central site that is a portal to all of the analysis.)
Back on my sever I needed somewhere to direct this traffic to.
Apache server setup
Setup the new vHost
sudo nano /etc/apache2/sites-available/newsite.conf
This is a copy / paste from an existing vHosts file that I recycled, it may have too many options enabled.
<VirtualHost *:80>
ServerAdmin admin@newsite.net
DocumentRoot /var/www/html/newsite.net/
ServerName newsite.net
ServerAlias newsite.net *.newsite.net
CustomLog /var/log/apache2/newsite_access.log combined
ErrorLog /var/log/apache2/newsite_error.log
<Directory /var/www/html/newsite.net>
Options Indexes Includes FollowSymLinks MultiViews
AllowOverride All
Require all granted
</Directory>
</VirtualHost>
There are various arguments about having discrete logs per site.
Some camps are in the ‘dump it all in one log and slice it up later’.
I’m in the ‘slice it up now, I’ll consolidate it all later if needed’ camp.
Update the hosts file
sudo nano /etc/host
192.168.0.90 newsite.net newsite
Make the Apache site live
sudo a2ensite newsite.conf
Then, time to create some temporary content, just to check that the name is resolving, and Apache is responding with something.
Create the www
directory and a holding page
sudo mkdir /var/www/html/newsite.net/
sudo nano /var/www/html/newsite.net/index.html
<html>
<head>
<title>newsite</title>
</head>
<body>
<br>
blah, blah, blah<br>
<br>
</body>
</html>
So far, so good, http://newsite.net seems to be a working webpage.
(GoAccess can deal with https:// sites by passing the certificate in the reporting string but I’m keeping it simple right now.)
I’m not particularly concerned about random people viewing the logs but I’m going to require authentication to view the page anyway.
Create a password
In order to password protect the folder, I had to create a password file.
sudo htpasswd -c /etc/apache2/.htpasswd foo
Go through the usual dance of entering the password twice.
Add the .htaccess
file
sudo nano /var/www/html/newsite.net/.htaccess
AuthType Basic
AuthName "Password Required"
Require valid-user
AuthUserFile /etc/apache2/.htpasswd
Finally, restart Apache
sudo /etc/init.d/apache2 restart
So, now I have a working webpage that requires a password to view.
The ‘goaccess’ package is available via the package manager of my Linux distribution but it’s an older version.
However, there is a Debian/Ubuntu repository containing the latest stable version.
I fancied living on the edge, so installed by the tried and tested ‘build’ method.
Building GoAccess from source
sudo apt install libncursesw5-dev libgeoip-dev libtokyocabinet-dev build-essential
wget https://tar.goaccess.io/goaccess-1.7.2.tar.gz
tar -xzvf goaccess-1.7.2.tar.gz
cd goaccess-1.7.2/
./configure --enable-utf8 --enable-geoip=legacy
make
sudo make install
goaccess --version
Configure GoAccess
To find the config file I used:
goaccess --dcf
Which told me where mine is located (it could have been in a number of places).
sudo /usr/local/etc/goaccess/goaccess.conf
However, I did the very minimum and uncommented only the ‘Time’, ‘Date’ and ‘Log Format’ options.
# The following time format works with any of the
# Apache/NGINX's log formats below.
#
time-format %H:%M:%S
# The following date format works with any of the
# Apache/NGINX's log formats below.
#
date-format %d/%b/%Y
# NCSA Combined Log Format
log-format %h %^[%d:%t %^] "%r" %s %b "%R" "%u"
Traffic data can be output to a terminal, a real-time webpage, a static html file or any number of output formats (including json, csv).
Apache traffic analysis
Terminal reporting
goaccess -f /var/log/apache2/newsite_access.log --log-format=COMBINED
Real-time reporting
The real-time reporting was where I came unstuck.
I have to stress this was 100% my fault.
I’d forgotten that I have a tightly locked down Debian / Apache system.
Only really basic ports are open with a very limited number of services allowed to connect and allow traffic through.
So, after several hours of banging my head on the desk, I remembered that I need to open a port on the router to allow GoAccess traffic in, and I need to open a port on the firewall to allow traffic through.
Update port forwarding on the router
Port 7890 is the default GoAccess port, so I sent traffic on 7890 back to 192.168.0.90 (my server IP).
Allow access through the firewall
Debian nftables is my firewall of choice so I needed to allow that traffic to pass through; however I have recently changed over to UFW and the configuration is a lot simpler.
I added the GoAccess listening port to the list of ports that can pass traffic through the firewall.
sudo nano /etc/nftables.conf
# activate the following line to accept common local services
tcp dport { 22, 80, 443, 7890 } ct state new accept
Real-time reporting commands
This takes only the latest ‘newsite_access.log‘ and sends it to ‘newsite.net/index.html‘.
goaccess /var/log/apache2/newsite_access.log -o /var/www/html/newsite.net/index.html --log-format=COMBINED --real-time-html
In order to look at all traffic, this command uses zcat to decompress every archived ‘newsite_access.log‘ files, pipe ( | ) them to goaccess where it adds in the latest uncompressed ‘newsite_access.log‘ and sends everything to ‘newsite.net/index.html‘.
zcat -f /var/log/apache2/newsite_access.log* | goaccess /var/log/apache2/newsite_access.log -o /var/www/html/newsite.net/index.html --log-format=COMBINED --real-time-html
Static reporting commands
A static report can be generated by removing ‘–real-time-html’ from the above command.
cron syntax
When a cron job is needed the command string is slightly different.
The full path to zcat and goaccess is required, also notice the ‘- -o’ in the string rather than ‘-o’.
This is because goaccess needs to know that you are piping data by using the extra ‘-‘.
/usr/bin/zcat -f /var/log/apache2/newsite_access.log* | /usr/local/bin/goaccess /var/log/apache2/newsite_access.log - -o /var/www/html/newsite.net/index.html
@reboot
I wanted my logging to restart when the server reboots.
So I added a quick script
sudo nano /etc/init.d/reboot_goaccess.sh
#!/bin/bash
# Refresh the GoAccess REAL-TIME webpage from the Apache access logs
# Update newsite_access.net webpage - (Notice the - -o that cron needs)
/usr/bin/zcat -f /var/log/apache2/newsite_access_access.log* | /usr/local/bin/goaccess /var/log/apache2/newsite_access.log - -o /var/www/html/newsite.net/index.html --log-format=COMBINED --real-time-htm
l
Made it executable
sudo chmod +x /etc/init.d/reboot_goaccess.sh
Updated root cron
sudo crontab -e
#
# Start the GoAccess real-time reporting webpage on boot
@reboot sh /etc/init.d/reboot_goaccess.sh
#
Tinkering with traffic analysis using GoAccess and Apache
There are a wealth of tinkering possibilities.
- Enabling / disabling the default info panels
- Custom colouring for the panels
- Slicing up the logs to include / exclude date ranges, crawlers, bots and more
- Loads more, really, lots to tinker with
The list seems endless.
Finally
I really like it but then I’m the target audience for these kind of things.
If I hadn’t bungled the port forwarding / firewall bit, the whole process would have taken about half an hour.
Apache Traffic Analysis Using GoAccess
Update: I’ve been using GoAccess for Apache traffic analysis for a few months now and it has been flawless.
Also, I’ve recently moved all my sites over to nginx and the only change I had to make was to the path to the log files to keep everything running perfectly.