How to self-host an APM platform

Date: 2021-08-16

Suppose that you work as a software developer and have to manage a web application together with your teammates. This application could be used by hundreds if not thousands of people and therefore it stands to reason that sooner or later all sorts of interesting situations will become apparent. Should this system of yours need to deal with some compliance requirements, then anything that integrates with it will also need to be self-hosted.

With all of these constraints, you'll need some solution that will allow you to see how well this application is performing, so that you can identify any problems even before they come into full effect and result in some angry or disappointed users and customers, whilst keeping all of the data about this within the bounds of your infrastructure.

It sounds like what you need then is a self-hosted APM solution. Application Performance Monitoring is an often overlooked aspect of software development, yet one that i believe deserves a lot more attention! Therefore, i've decided to write this guide to help with setting up one such solution - Apache Skywalking.

Why Apache Skywalking

The first question would be about why choose this particular tool, as opposed to many of the others out there. For example, if we're talking about the ones that play nicely with Java, then these could also be considered:

JavaMelody - this is perhaps one of the simplest options, however your metrics will be stored on the same server as the app and it is coupled to the specific environments
Pinpoint APM - this is a more capable alternative to JavaMelody, however the UI and other components end up being a tad on the slow side and it's a bit resource hungry
Stagemonitor - this feels like a step towards the more lightweight solutions, yet seems a tad basic

At the end of the day, each tool has its own advantages and disadvantages. In our case, however, the main factor that motivated me to choose Skywalking, was the large amount of maintainers the project has on GitHub:

contributor count

While this doesn't always equate the project being alive and well, at the same time it's a good proxy for how much interest exists in a particular project. By proxy, this allows figuring out which of the projects reviewed is most likely to be alive in 5 years and which will have the best support for any of the frameworks that would need to be integrated with it.

Furthermore, Skywalking provides a variety of useful functionality out of the box, such as the ability to integrate with many Java frameworks, as well as optional capabilities to provide information about what's going on in the browser, thus allowing the developers to get the full picture more easily.

Setting up Apache Skywalking

The actual setup will require a server to run the software, which will be reachable by the other app servers (and clients, in the case of browser monitoring). On it, you'll need to set up the Skywalking app, as well as an ElasticSearch database, where it will store the data.

Here i pulled a little hack - instead of installing and configuring everything manually, instead i chose to use Docker images for all of the components, using Docker Swarm to orchestrate them. The additional benefit of that approach is that i can just share the YAML file contents for Swarm and you can be running all of it in just a few minutes:

version: "3.7"
  services:
    skywalking_elasticsearch:
      image: docker.elastic.co/elasticsearch/elasticsearch:6.8.8
      networks:
        - httpd_ingress_network
      healthcheck:
        test: ["CMD-SHELL", "curl --silent --fail localhost:9200/_cluster/health || exit 1"]
        interval: 30s
        timeout: 10s
        retries: 9
        start_period: 30s
      environment:
        - discovery.type=single-node
        - bootstrap.memory_lock=true
        - "ES_JAVA_OPTS=-Xms512m -Xmx1792m"
      deploy:
        placement:
          constraints:
            - node.hostname == apm-server
        resources:
          limits:
            memory: 2048M
            cpus: '0.50'
      ulimits:
        memlock:
          soft: -1
          hard: -1
      volumes:
        - /app/skywalking/elasticsearch-data:/usr/share/elasticsearch/data
    # See the license at https://github.com/apache/skywalking-docker/blob/master/compose/docker-compose.yml
    skywalking_app:
      image: apache/skywalking-oap-server:8.6.0-es6
      networks:
        - httpd_ingress_network
      depends_on:
        - skywalking_elasticsearch
      links:
        - skywalking_elasticsearch
      healthcheck:
        test: ["CMD-SHELL", "/skywalking/bin/swctl ch"]
        interval: 30s
        timeout: 10s
        retries: 9
        start_period: 30s
      ports:
        # This one is for Skywalking clients (back-end APM, like Java)
        - target: 11800
          published: 11800
          protocol: tcp
          mode: host
        # This one is for the REST API (front-end monitoring)
        - target: 12800
          published: 12800
          protocol: tcp
          mode: host
      environment:
        SW_STORAGE: elasticsearch
        SW_STORAGE_ES_CLUSTER_NODES: skywalking_elasticsearch:9200
        SW_HEALTH_CHECKER: default
        SW_TELEMETRY: prometheus
        JAVA_OPTS: "-Xms512m -Xmx896m"
      deploy:
        placement:
          constraints:
            - node.hostname == apm-server
        resources:
          limits:
            memory: 1024M
            cpus: '0.50'
    skywalking_webapp:
      image: apache/skywalking-ui:8.6.0
      networks:
        - httpd_ingress_network
      depends_on:
        - skywalking_app
      links:
        - skywalking_app
      environment:
        SW_OAP_ADDRESS: skywalking_app:12800
      deploy:
        placement:
          constraints:
            - node.hostname == apm-server
        resources:
          limits:
            memory: 1024M
            cpus: '0.50'
  networks:
    httpd_ingress_network:
      driver: overlay
      attachable: true
      external: true

(note: i've updated the above to reflect version 8.6.0, because the earlier ones had problems with spring-boot-devtools; though images below might still show 8.4.0 in places)

Curiously, here you'll also see a connection to an httpd_ingress_network, which is just a container of httpd, which i expose to the outside to allow for SSL/TLS and anything else that i might need, such as ensuring that anyone who wants to connect to the app needs to at least use basicauth.

The actual configuration for the httpd web server is pretty simple in its minimal format as well (basicauth is not included here):

<VirtualHost *:80>
    ServerName apm.mycompany.com
    Redirect permanent / https://apm.mycompany.com/
</VirtualHost>

<VirtualHost *:443>
    ServerName apm.mycompany.com

    # ssl config
    SSLEngine on
    SSLProxyEngine on
    SSLCertificateFile /app/certs/mycert.crt
    SSLCertificateKeyFile /app/certs/mycert.key
    SSLCertificateChainFile /app/certs/mycert.intermediate.crt
    RequestHeader unset X-Client-Cert

    # reverse proxy
    RequestHeader set X-Forwarded-Proto https
    ProxyPreserveHost On
    ProxyPass        / http://skywalking_webapp:8080/
    ProxyPassReverse / http://skywalking_webapp:8080/

</VirtualHost>

Thankfully due to these new technologies, configuration of new software is becoming more and more of a non-issue and hopefully eventually will be as easy as installing a new app on your phone.

So, after setting up the above, you should be greeted with an empty instance of the software that's up and running:

skywalking

Then, all that's left is to connect the applications to it!

Integrating a Java back end with it

When you have your instance up and running, you should then be able to connect your applications to it, so that they'd begin sending metrics. Of course, the method that you'll need to use will be dependent on the back end technologies that your application uses, there being quite a few options offered on the Skywalking homepage.

Now, the interesting thing is that in the case of Java, you actually need to download the full distribution to get access to the agent:

which to download

This is because the agent files for Java are included under the agent directory:

agent contents

With these files in hand, all that's left is to follow the instructions for configuring the Java agent on the GitHub repository. For starters, we'll add these files to a new subfolder of our project source directory and then edit the configuration according to the documentation:

configuration

Next, to test whether it works with the local app, we add the launch configuration -javaagent parameter for the launch config within the IDE:

vm options for agent

That's it! When your application starts, logs should start filling out under skywalking/logs/skywalking-api.log and if everything's configured correctly, you'll soon also see data appearing in the Skywalking UI:

Note that if you're using Spring Boot with spring-boot-devtools, then you'll definitely need version 8.6.0, because if you're running an older version, then most likely no data will be sent to the instance. Also, you might need to change the actual date range at the bottom corner of the UI, so the data actually shows up! To me it seems like a UX failure on the part of the developers, because this confused me initially. -__-

Also, in regards to deploying this, you would probably also keep the skywalking directory on the server or within the container, also needing to add the javaagent parameter to the configuration, like:

environment:
  TZ: "Europe/Riga"
  JAVA_TOOL_OPTIONS: "-javaagent:/app/skywalking/skywalking-agent.jar"

That should be enough to get everything working! Of course, if you'd like additional functionality, you'll probably want to have a look at the plugin and optional plugin directories, as well as this repository which contains some plugins that had to be offered separately.

Integrating a JavaScript front end with it

Things do get a bit more complicated in regards to the front end, however. For that, you'll probably want to use the skywalking-client-js library.

Essentially, you'll need something a bit like the following added to your project:

ClientMonitor.register({
    collector: "/apm",
    service: "my_application",
    pagePath: location.href,
    serviceVersion: "1.0",
    useFmp: true
});

With this configuration, the monitoring data will be sent to /apm/browser/perfData. But we don't have such an endpoint available under your app, now do we? So, with a bit of reverse proxy configuration, we'll be able to add it, as well as well as avoid any potential CORS issues, that we'd get if we were to try putting in a full path, such as https://apm.mycompany.com.

Now, this section may differ more in your case, since you could also use something like Nginx for ingress, but in my case above i'm using Apache, so here's an example configuration:

# Skywalking reverse proxy
RequestHeader set X-Forwarded-Proto https
ProxyPreserveHost On
ProxyPass        /apm/browser/ http://skywalking_app:12800/browser
ProxyPassReverse /apm/browser/ http://skywalking_app:12800/browser

# Your app reverse proxy
RequestHeader set X-Forwarded-Proto https
ProxyPreserveHost On
ProxyPass        / http://your_app:8080/
ProxyPassReverse / http://your_app:8080/

(admittedly, there's the assumption here that you have the necessary proxy plugins enabled in your configuration)

Personally, i think that the above configuration is pretty great, because it allows you to use your current domain, as well as the SSL/TLS certificate, without necessarily having to worry about setting up something separate for Skywalking.

With any luck, now you should start getting information about the requests done to your application from the users' browsers:

browser-apm

As a matter of fact, you'll even be able to see the logs for browser errors, which you would otherwise have absolutely no idea about:

browser-logs

Of course, you can also test this by checking the network tab within your app:

browser-apm-broken

If you see the above, you've probably messed up the proxy settings, for example. In that case, you can try opening the URL in your browser and check whether you get the 405 error from Skywalking itself - in that case, you'll at least know that the proxy is working, but the problem lies elsewhere.

Summary

In the end, using something like Skywalking is definitely possible for self-hosting an APM platform, without necessarily having to spend bunches of money on cloud services, or even configuring dozens of different components for something like Sentry. Admittedly, the documentation for configuring and running Skywalking is lacking and its UI is also a bit clunky, but hey, it works!