Recently, i made a post about configuring and running a self-hosted APM platform, Apache Skywalking. However, APM is just one piece of the puzzle when wanting to run applications - some care should also be given to the idea of analytics. After all, you'll want to know how your application is used, how many people use it and quite possibly, on what sorts of devices they run it, so that you better know what to pay attention to.
Now, within the industry, Google Analytics has taken a foothold as the standard, however, that means relying on a 3rd party to provide the service and also store all of your data. It should be obvious, why in many cases that's not really an option, especially when concerns such as GDPR simply cannot be ignored. Because of this, it's probably a good idea to look at the open source and self-hosted solutions.
While there certainly are many options out there, personally, i find that Matomo Analytics are one of the best options out there - simple to run, reliable, yet advanced enough to give you access to almost all of the functionality that you could ever wish for in an analytics solution! Because of this, i don't think that exploring other analytics solutions is necessary within the confines of this post, because in the case of APM the landscape was way more complicated, than we have here. Of course, if you'd still like to have a look at some of the alternatives, here's a pretty useful link.
So, for the time being, let's get into configuring Matomo! It is actually pretty simple to do, as Matomo is one web application and a backing database, either MySQL or MariaDB. We can easily achieve this by deploying a few Docker containers, in this case i'll also provide an example for Docker Swarm, though that may as well be a Docker Compose file with some slight edits:
version: '3.7' services: matomo_mysql: image: mariadb:10.5.11 command: --max-allowed-packet=64MB volumes: - /app/matomo/mysql:/var/lib/mysql environment: - MYSQL_ROOT_PASSWORD=long_root_password_goes_here - MYSQL_PASSWORD=long_user_password_goes_here - MYSQL_DATABASE=matomo - MYSQL_USER=matomo networks: - httpd_ingress_network deploy: placement: constraints: - node.hostname == analytics-server resources: limits: memory: 512M cpus: '0.50' matomo: image: matomo:4.3.1-apache restart: always volumes: - /app/matomo/html:/var/www/html environment: - MATOMO_DATABASE_HOST=matomo_mysql - MATOMO_DATABASE_ADAPTER=mysql - MATOMO_DATABASE_TABLES_PREFIX=matomo_ - MATOMO_DATABASE_USERNAME=matomo - MATOMO_DATABASE_PASSWORD=long_user_password_goes_here - MATOMO_DATABASE_DBNAME=matomo networks: - httpd_ingress_network deploy: placement: constraints: - node.hostname == analytics-server resources: limits: memory: 512M cpus: '0.50' networks: httpd_ingress_network: driver: overlay attachable: true external: true
Of course, in most circumstances you'll also want to set up a reverse proxy to ensure that your analytics are available behind SSL/TLS. In my case, the aforementioned httpd ingress network will come useful, because i connect an Apache2/httpd container with it and therefore can provide said proxy functionality with configuration like this:
<VirtualHost *:80> ServerName analytics.mycompany.com Redirect permanent / https://analytics.mycompany.com/ </VirtualHost> <VirtualHost *:443> ServerName analytics.mycompany.com # ssl config SSLEngine on SSLProxyEngine on SSLCertificateFile /app/certs/mycert.crt SSLCertificateKeyFile /app/certs/mycert.key SSLCertificateChainFile /app/certs/mycert.intermediate.crt RequestHeader unset X-Client-Cert # reverse proxy RequestHeader set X-Forwarded-Proto https ProxyPreserveHost On ProxyPass / http://matomo:80/ ProxyPassReverse / http://matomo:80/ </VirtualHost>
And that's basically it! After opening the URL in your browser, you'll be prompted to configure the instance and then will be able to set up sites to track:
Then, all that you need to do is connect your applications to the monitoring platform.
In most cases, this will definitely work okay, however that'll mean that you will have to expose at least parts of your app to the Internet (notably,
/matomo.php) and that your app will make requests to many different domains. Personally, i think that there's nothing wrong with this approach, but in certain situations, you might also want to set up a reverse proxy within you actual app, a bit like we did with the APM platform.
Of course, that can also be achieved with some pretty simple Apache2/httpd configuration, a bit like so:
# Matomo reverse proxy RequestHeader set X-Forwarded-Proto https ProxyPreserveHost On ProxyPass /analytics/matomo.js http://matomo:80/matomo.js ProxyPassReverse /analytics/matomo.js http://matomo:80/matomo.js ProxyPass /analytics/matomo.php http://matomo:80/matomo.php ProxyPassReverse /analytics/matomo.php http://matomo:80/matomo.php # Your app reverse proxy RequestHeader set X-Forwarded-Proto https ProxyPreserveHost On ProxyPass / http://your_app:8080/ ProxyPassReverse / http://your_app:8080/
That should be sufficient to allow the application to proxy the requests over to Matomo and then you'd simply change the path to a relative one in the tracking code that you got:
Then, everything should start working with no problems. Of course, you can also just have a look at the actual application in your browser's network tab, to see whether the data is being successfully sent:
Once your site picks up more traffic, you'll actually start seeing more benefits to having analytics on it. For example, here's the dashboard of one of my study material forums, which i set up to help other students in my university, free of charge:
Knowing that it's actually being used by someone let's me know that i probably should keep it running for a while!
You can also drill down to specific visits, for example, if someone would tell you that they ran into a problem on your site, you could track the actual actions that they took on it in a bit more fine grained way, to help you figure out the usage patterns:
Not only that, but you can also get information about things, like device types that were used:
As well as the screen sizes:
And the operating systems and browsers:
Also, you can see when your site is most actively used to anticipate load spikes:
Last, but not least, there's also illustrations available for how people navigate the site and both where the incoming users originate from, as well as where they head to:
Realistically, you probably don't have the time to implement all of the functionality above in a custom manner for your solution. Not only that, but using SaaS solutions and cloud based stuff like Google Analytics could create serious risks for you, especially in regards to the privacy of your data. Because of this, it's amazing that there are solutions such as Matomo available, which you can put on your own servers and use at absolutely no charge.
And if you not only want your applications to run, but also to run well, then you'll be much better off having analytics in place!