What it truly takes to get a web app running

Date: 2021-10-22

Recently, in a discussion over on Hacker News, someone expressed displeasure at how complicated running apps is in the modern day and age:

Why is this so hard & tedious. I've seen enough infrastructure as code projects over the last fifteen years to know that actually very little has changed in terms of the type of thing people build with these tools. There's always a vpc with a bunch of subnets for the different AZs. Inside those go vms that run whatever (typically dockerized these days) with a load balancer in front of it and some cruft on the side for managed services like databases, queues, etc. The LB needs a certificate.

So why is it so hard to get all of that orchestrated? Why do we have to boil the oceans to get some sane defaults for all of this. Why do we need to micromanage absolutely everything here? Monitoring, health checks, logging systems, backups, security & permissions. All of this needs to be micromanaged. All of it is disaster waiting to happen if you get it wrong. All of it is bespoke cruft that every project is reinventing from scratch.

That post resonated with me, so i decided to take my response to it and turn it into a blog post to illustrate all of the facets of software development that are not always talked about much, or getting an overview of which can sometimes be difficult.

Concerns

I don't think that we appreciate just how many things are necessary to run a web application and do it well. There is an incredible amount of complexity that we attempt to abstract away and we're essentially standing on the shoulders of giants, reaping the benefits of millions of man hours that have been spent to ensure that we have a good base to build our solutions upon.

Sometimes i wish that there'd be a tool that could tell me just how many active code lines are responsible for the processes that are currently running on any of the servers and in which languages. Off the top of my head, what's necessary to ship an enterprise web app in 2021.

Runtimes

runtimes

No one writes web applications in assembler code or a low level language like C with no dependencies - there is usually a complex runtime like JVM (for Java), CLR (for .NET), or whatever Python or Ruby implementations are used, which are already absolutely huge. And every language has their own runtime, which is a lot, given how many languages are out there. I've mentioned just a few above, but the likes of Node, PHP and others also shouldn't be forgotten about.

On the bright side, they typically allow you to not only use a language with a higher level of abstraction, but also handle platform differences, memory management and are relatively easy to integrate external packages with. Of course, i believe that languages like Go and Rust, which compile to static executables are also worthy of mention here!

Libraries and frameworks

libraries

Then there are libraries for doing common tasks in each language, be it serving web requests, serving files, processing JSON data, doing server side rendering, doing RPC or some sort of message queueing etc, in part due to there not being just one web development language, but many. Whether this is a good thing or a bad thing, i'm not sure.

For the back end, some of the more popular ones would be Spring Boot (for Java), ASP.NET Core (for .NET), Django (for Python) and Rails (for Ruby). Front end can also be really complex, since there are numerous libraries/frameworks out there for getting stuff rendering in a browser in an interactive way (Angular, Vue, React, jQuery), each with their own toolchains.

Packaging and environments

packaging

But then there are also all the ways to package software, be it Docker containers, other OCI compatible containers (ones that have nothing to do with the Docker toolchain, like buildah + podman), approaches like using Vagrant, or shipping full size VMs, or just copying over some files on a server and either using Ansible, Chef, Puppet, Salt or manually configuring the environment. Automating this can also be done in any number of ways, be it GitLab CI, GitHub Actions, Jenkins, Drone or something else.

Running

running

When you get to actually running your apps, what you have to manage is an entire operating system, from the network stack, to resource management, to everything else. And, of course, there are multiple OS distributions that have different tools and approaches to a variety of tasks (for example, OpenRC in Alpine vs systemd in Debian/Ubuntu).

Ingress

ingress

But these OSes also don't live in a vacuum so you end up needing a point of ingress, possible load balancing or rate limiting, so eventually you introduce something like Apache, Nginx, Caddy, Traefik and optionally something like certbot for the former two. Those are absolutely huge dependencies as well, just have a look at how many modules the typical Apache installation has, all to make sure that your site can be viewed securely, do any rate limiting, path rewriting etc.!

Data

data

Of course you'll also need to store your data somewhere. You might manage your databases with the aforementioned approaches to automate configuration and even running them, but at the end of the day you are still running something that has decades of research and updates behind them, regardless of whether it's SQLite, MariaDB, MySQL, PostgreSQL, SQL Server, S3, MongoDB, Redis or anything else. All of which have their own ways of interacting with them and different use cases, for example, you might use MariaDB for data storage, S3 for files and Redis for cache.

Support

support

And that's still not it! You also probably want some analytics, be it Google Analytics, Matomo, or something else. And monitoring, something like Nagios, Zabbix, or a setup with Prometheus and Grafana. Oh and you better run something for log aggregation, like ELK or Graylog. And don't forget about APM as well, to see what's going on in your app in depth, like Apache Skywalking or anything else.

Others

There can be additional solutions in there as well, such as a service mesh to aid with discoverability of services, circuit breakers to route traffic appropriately, security solutions like Vault to make sure that your credentials aren't leaked, sometimes an auto scaling solution as well etc.

In summary, it's not just because of there being a lot of tools for doing any single thing, but rather that there are far too many concerns to be addressed in the first place. To that end, it's really amazing that you can even run things on a Raspberry Pi in the first place, and that many of the tools can scale from a small VPS to huge servers that would handle millions of requests.

What to do

That said, it doesn't have to always be this complex. If you want to have a maximally simple setup, just use something like PHP with a RDBMS like MariaDB/MySQL and server side rendering. Serve it out of a cheap VPS: i have been using Time4VPS (affiliate link in case you want to check them out), though DigitalOcean, Vultr, Hetzner, Linode and others are perfectly fine too. Maybe use some super minimal CI like GitLab CI, Drone, or whatever your platform of choice supports.

That should be enough for most side projects and personal pages. I also opted for a Docker container with Docker Swarm + Portainer, since that's the simplest setup that i can use for a large variety of software and my own projects in different technologies, though that's a personal preference. Of course, not every project needs to scale to serving millions of users, so it's not like i need something advanced like Kubernetes (well, Rancher + K3s can also be good, though many people also enjoy Nomad).

Edit: there are PaaS out there that make things noticeably easier for you by focusing on doing some of the things above for you, but that can lead to a vendor lock, so be careful with those. Regardless, maybe solutions like Heroku or Fly.io are worth checking out as well, though i'd suggest you read this article.