How to self-host an analytics platform

Recently, i made a post about configuring and running a self-hosted APM platform, Apache Skywalking. However, APM is just one piece of the puzzle when wanting to run applications - some care should also be given to the idea of analytics. After all, you'll want to know how your application is used, how many people use it and quite possibly, on what sorts of devices they run it, so that you better know what to pay attention to.

Now, within the industry, Google Analytics has taken a foothold as the standard, however, that means relying on a 3rd party to provide the service and also store all of your data. It should be obvious, why in many cases that's not really an option, especially when concerns such as GDPR simply cannot be ignored. Because of this, it's probably a good idea to look at the open source and self-hosted solutions.

While there certainly are many options out there, personally, i find that Matomo Analytics are one of the best options out there - simple to run, reliable, yet advanced enough to give you access to almost all of the functionality that you could ever wish for in an analytics solution! Because of this, i don't think that exploring other analytics solutions is necessary within the confines of this post, because in the case of APM the landscape was way more complicated, than we have here. Of course, if you'd still like to have a look at some of the alternatives, here's a pretty useful link.

Setting up Matomo Analytics

So, for the time being, let's get into configuring Matomo! It is actually pretty simple to do, as Matomo is one web application and a backing database, either MySQL or MariaDB. We can easily achieve this by deploying a few Docker containers, in this case i'll also provide an example for Docker Swarm, though that may as well be a Docker Compose file with some slight edits:

version: '3.7'
  services:
    matomo_mysql:
    image: mariadb:10.5.11
    command: --max-allowed-packet=64MB
    volumes:
      - /app/matomo/mysql:/var/lib/mysql
    environment:
      - MYSQL_ROOT_PASSWORD=long_root_password_goes_here
      - MYSQL_PASSWORD=long_user_password_goes_here
      - MYSQL_DATABASE=matomo
      - MYSQL_USER=matomo
    networks:
      - httpd_ingress_network
    deploy:
      placement:
      constraints:
        - node.hostname == analytics-server
      resources:
      limits:
        memory: 512M
        cpus: '0.50'
    matomo:
    image: matomo:4.3.1-apache
    restart: always
    volumes:
      - /app/matomo/html:/var/www/html
    environment:
      - MATOMO_DATABASE_HOST=matomo_mysql
      - MATOMO_DATABASE_ADAPTER=mysql
      - MATOMO_DATABASE_TABLES_PREFIX=matomo_
      - MATOMO_DATABASE_USERNAME=matomo
      - MATOMO_DATABASE_PASSWORD=long_user_password_goes_here
      - MATOMO_DATABASE_DBNAME=matomo
    networks:
      - httpd_ingress_network
    deploy:
      placement:
      constraints:
        - node.hostname == analytics-server
      resources:
      limits:
        memory: 512M
        cpus: '0.50'
  networks:
    httpd_ingress_network:
    driver: overlay
    attachable: true
    external: true

Of course, in most circumstances you'll also want to set up a reverse proxy to ensure that your analytics are available behind SSL/TLS. In my case, the aforementioned httpd ingress network will come useful, because i connect an Apache2/httpd container with it and therefore can provide said proxy functionality with configuration like this:

<VirtualHost *:80>
    ServerName analytics.mycompany.com
    Redirect permanent / https://analytics.mycompany.com/
</VirtualHost>

<VirtualHost *:443>
    ServerName analytics.mycompany.com

    # ssl config
    SSLEngine on
    SSLProxyEngine on
    SSLCertificateFile /app/certs/mycert.crt
    SSLCertificateKeyFile /app/certs/mycert.key
    SSLCertificateChainFile /app/certs/mycert.intermediate.crt
    RequestHeader unset X-Client-Cert

    # reverse proxy
    RequestHeader set X-Forwarded-Proto https
    ProxyPreserveHost On
    ProxyPass        / http://matomo:80/
    ProxyPassReverse / http://matomo:80/

</VirtualHost>

And that's basically it! After opening the URL in your browser, you'll be prompted to configure the instance and then will be able to set up sites to track:

matomo-configuration-example

Then, all that you need to do is connect your applications to the monitoring platform.

Integrating a JavaScript front end with it

Here, you probably have two options. By default, Matomo gives you some JavaScript that you can embed within the page to enable the analytics functionality:

matomo-tracking-code

In most cases, this will definitely work okay, however that'll mean that you will have to expose at least parts of your app to the Internet (notably, /matomo.js and /matomo.php) and that your app will make requests to many different domains. Personally, i think that there's nothing wrong with this approach, but in certain situations, you might also want to set up a reverse proxy within you actual app, a bit like we did with the APM platform.

Of course, that can also be achieved with some pretty simple Apache2/httpd configuration, a bit like so:

# Matomo reverse proxy
RequestHeader set X-Forwarded-Proto https
ProxyPreserveHost On
ProxyPass        /analytics/matomo.js http://matomo:80/matomo.js
ProxyPassReverse /analytics/matomo.js http://matomo:80/matomo.js
ProxyPass        /analytics/matomo.php http://matomo:80/matomo.php
ProxyPassReverse /analytics/matomo.php http://matomo:80/matomo.php

# Your app reverse proxy
RequestHeader set X-Forwarded-Proto https
ProxyPreserveHost On
ProxyPass        / http://your_app:8080/
ProxyPassReverse / http://your_app:8080/

That should be sufficient to allow the application to proxy the requests over to Matomo and then you'd simply change the path to a relative one in the tracking code that you got:

var u='/analytics/';

Then, everything should start working with no problems. Of course, you can also just have a look at the actual application in your browser's network tab, to see whether the data is being successfully sent:

matomo-browser-example

The benefits of having analytics

Once your site picks up more traffic, you'll actually start seeing more benefits to having analytics on it. For example, here's the dashboard of one of my study material forums, which i set up to help other students in my university, free of charge:

matomo-study-forum-example

Knowing that it's actually being used by someone let's me know that i probably should keep it running for a while!

You can also drill down to specific visits, for example, if someone would tell you that they ran into a problem on your site, you could track the actual actions that they took on it in a bit more fine grained way, to help you figure out the usage patterns:

matomo-study-forum-example-2

Not only that, but you can also get information about things, like device types that were used:

matomo-devices

As well as the screen sizes:

matomo-screens

And the operating systems and browsers:

matomo-os-and-browsers

Also, you can see when your site is most actively used to anticipate load spikes:

visit-times

Last, but not least, there's also illustrations available for how people navigate the site and both where the incoming users originate from, as well as where they head to:

matomo-visits

Summary

Realistically, you probably don't have the time to implement all of the functionality above in a custom manner for your solution. Not only that, but using SaaS solutions and cloud based stuff like Google Analytics could create serious risks for you, especially in regards to the privacy of your data. Because of this, it's amazing that there are solutions such as Matomo available, which you can put on your own servers and use at absolutely no charge.

And if you not only want your applications to run, but also to run well, then you'll be much better off having analytics in place!