Using Ubuntu as the base for all of my containers

Date: 2022-04-11

Up until now, i had considered using Alpine as the base image for my OCI containers (be it with Docker or otherwise), because it felt like a pretty good and minimalistic option, which should be good for security. However, as i tried using more and more different types of software, i ran into the issue where Alpine was too much of a headache - certain packages wouldn't install, for others there was only documentation for "mainstream" distros like Debian/Ubuntu/CentOS available, certain others worked weirdly (e.g. certain parameters in busybox utils differ from what Git Bash has packaged, or what my other servers that run Debian/Ubuntu have), or more slowly for no reason at all, for example: Using Alpine can make Python Docker builds 50× slower

The final straw was when i looked into the EOL dates for Alpine 3.15, the latest stable release that i wanted to use:

Alpine EOL

Now, a 2 year release cycle isn't too bad, all things considered. It's actually pretty close to what Debian does, with the exception that Debian has a LTS offering and that other distros like Ubuntu also offer support for their releases for 5 years. Some people out there might want to run more recent software and have no problem with updating their stuff every now and then.

But here's the thing - i don't care for the most recent releases and i also don't have the time to keep everything up to date due to breakages between the new major releases. I work full time and thus this maintenance would eat up all of my free time, which i'm not looking forwards to. Thus, i need a middle ground between things just working and still getting security fixes, even at the expense of missing out on recent performance and functionality improvements (e.g. Ruby 2 vs Ruby 3).

So what can be done to achieve that?

Switching to Ubuntu

The answer to that question was to run Ubuntu LTS, obviously.

Initially i wanted to avoid doing that, because Alpine has excellent package install speeds and boasts about having smaller container sizes, but once my Ruby on Rails app container bloated to over 300 MB regardless of which base image i used, i realized that this hardly matters. The main benefit of Ubuntu LTS in my eyes, is its relative stability - i can just set up the container images to rebuild periodically and automatically install the updates, knowing that those will primarily be security updates with far fewer breakages. Furthermore, i can keep this up for longer, before eventually having to use the latest major version.

Not only that, but i decided to actually try to go for a setup in which i would build ALL of my own application container images, with the exception of databases, which i'd simply download from Docker Hub and upload to my Nexus instead. This approach seemed really nice to me for a plethora of reasons:

i can have common base images which allows fewer layers to be redownloaded on the servers
i can have a common set of utilities across all of the images to aid me in debugging when necessary (e.g. traceroute) or to improve DX (e.g. nano)
i can install security patches for EVERYTHING by simply updating the base image, on which everything else is based (e.g. the base image ensures Ubuntu updates, which are installed once, then Ruby images would ensure that the latest Ruby security patches get installed into the Ruby container image)

Since a picture is worth more than a thousand words, here's an example of what this dependency graph/build order looks like:

DAG

Now images can be built in the order of their dependencies: everything that is based on Ubuntu is build after it is pushed to Nexus and so on. For example, i also install PHP no top of the Apache web server as a module, so i build it after the Apache setup is finished (you might advise me to install FPM but it simply didn't work because of no default config), similarly Ruby also has two images - one to actually run my applications and the other to install a bunch of development tools for multi-stage builder images without bloating the runner images.

Actually, here's the example Dockerfile for my Ubuntu image:

# Switched from Alpine to Ubuntu, because the EOL is 2025.
FROM docker-proxy.registry.kronis.dev:443/ubuntu:focal-20220404

# Time zone
ENV TZ="Europe/Riga"

# Use Bash as our shell (disable TZ selection)
ARG DEBIAN_FRONTEND=noninteractive
SHELL ["/bin/bash", "-c"]

# Run updates on base just to make sure everything is up to date
RUN apt-get update && apt-get upgrade -y && apt-get clean

# Some base software, more to be added
RUN apt-get update && apt-get install -y curl wget nano net-tools software-properties-common apt-transport-https git && apt-get clean

# Print versions
RUN cat /etc/*-release

As you can see, it does install system updates whenever it's run (i do use a pinned version because Nexus allows me to cache it and thus hit Docker Hub less frequently) and also installs common software that i might want everywhere, as described previously, as well as some packages that other tools might need, for example, for adding new apt repositories to install something like .NET.

Now, you might rightfully ask:

Shouldn't you forgo most utilities in your container images?

What about distroless images, shouldn't you use those?

Wouldn't Alpine be more secure?"

And my answer to those is rather simple: "Yep, probably... If i was running a bank."

You see, there is a world of difference between the best practices that should be followed by some imaginary org with boundless resources to spend on securing everything and between a single person who just wants to build some boring software for their homelab without getting bogged down in constant maintenance. In essence, i'm securing my containers against automated low effort attacks from China and Russia, rather than against Mossad, which would probably already be in my system should they want to.

Plus, what's working in my defense is the fact that i'm pretty poor and there's not a lot to steal. Worst you could do is probably hijack my servers and mine crypto, or maybe steal my anime memes, which are backed up anyways, since backup automation for something so important is paramount!

What did it take

Honestly, all of this was a lot of effort. Things kept breaking and building all of your own container images also means that suddenly you need to care about the inner workings of all of the stacks that you're about to run. One might suggest that you should be aware of this anyways and that having clear instructions on how everything is built so you can do that from scratch later is nice, but then again, i'm just a mortal with a 9-5 job and a few free hours on the weekends, so when i can't say that things "just work", it's all very bothersome.

Curiously, the stack that was the hardest to set up was the Apache web server and PHP, because the first doesn't play nicely with the Ubuntu container image (bad environment config, setvars doesn't work properly, APACHE_RUN_DIR is not a subfolder of ServerRoot) and because the latter won't work properly with FPM (a2enconf php8.0-fpm doesn't work because such config doesn't exist, but maybe the instructions for getting PHP 8 on Ubuntu 20.04 LTS are bad)

Just look at how many commits i needed just to get such a seemingly simple piece of software running with the plugins that i needed for proxying, path rewriting, SSL/TLS (honestly mod_md is awesome and it might allow me to use something as stable as Apache2 without having to run certbot separately or use a web server like Caddy altogether) and some added security:

what did it take

Furthermore, i also found it challenging to find a balance between rebuilding things often and having shorter Nexus cleanup times for the actual registries: building things often does take up a lot of network bandwidth (the more software i need, the more will be downloaded for installs), whereas in the case of rebuilds due to fixing breakages, longer Nexus image storage times would create problems with the disk space filling up.

Currently:

the builds happen every morning (might switch to weekly later)
the actual old images and their layers are purged in Nexus every 3 days for dev images (e.g. PHP, Java, Ruby) and 30 days for prod images (e.g. Docker, Ubuntu, Apache)
the difference here is mostly in the fact that the first ones are not necessary for anything that's running on servers (for builds only), whereas the latter can be run on the servers (CI processes, or web servers)

That said, in some capacity, right now i have the worst of both worlds. The builds actually take a while and use up about 40% of each of my homelab's resources while it's going on, which is passable due to the builds being distributed between multiple servers and there being container limits in place so the Drone CI tasks don't get too crazy, but it still takes a while and isn't exactly optimal:

CI example

Not only that, but the storage is also somewhat bloated in comparison. Every rebuild that i might have to do after the changes, will create additional layers which will be kept in place until the eventual cleanup the few days later. But even with this in place, the minimal amount of space that's necessary to keep the container images for everything that i might want to run, is still measured in multiple GB:

Nexus blob store

Luckily, both of those tradeoffs aren't too bad and it's not like i'm building everything from source like certain other crazy folk out there might want to do. :)

As it currently stands, this setup actually allows me to hit Docker Hub only to fetch the initial image, as well as stuff that i want to initialize my proxy registry with - container images for databases like MariaDB, PostgreSQL/PostGIS, Redis, MongoDB, MinIO etc. Of course, i am yet to add those, but even if it'll take up another GB or two, i will be able to pull those images as often as i want, without having to open my wallet to pay for Docker Hub, since i already have my own servers.

Should you do this yourself?

Now, with that setup of mine working, the question remains: is it actually a good idea?

Personally, i'd say that it's suitable for a particular set of circumstances, like trying to go "sort of Stallman" - decoupling yourself from Service as a Software Substitute (SaaSS) solutions as much as possible, without living in the middle of a forest and installing everything from SD cards. If Docker Hub would decide to charge for anything above 50 anonymous pulls a day, i could easily not pay, since instead of needing 20 different images on multiple servers, i now only need to download one or two and build the rest myself.

Roadblock #1: toolchain bloat

That said, there are certain things, which are pretty horrible. For example, have a look at the ruby_dev image, which is needed for me to build some Rails apps:

# We base our Ruby image on the common Ubuntu image.
FROM docker-dev.registry.kronis.dev:443/ruby

# We use Python for some Gems
RUN apt-get update && apt-get install -y python2 && apt-get clean && cd /usr/bin && ln -s python2 python

# We need something newer than Ubuntu default Node 10
RUN curl -fsSL https://deb.nodesource.com/setup_16.x | bash - \
&& apt-get update && apt-get install -y nodejs && apt-get clean

# We use Yarn for front end stuff, but can't install package called yarn because it's something else (WTF)
RUN curl -sS https://dl.yarnpkg.com/debian/pubkey.gpg | apt-key add \
&& echo "deb https://dl.yarnpkg.com/debian/ stable main" | tee /etc/apt/sources.list.d/yarn.list \
&& apt-get update \
&& apt-get install -y yarn \
&& apt-get clean

# Print versions
RUN ruby --version && bundler --version && python --version && node --version && yarn --version

Now, instead of having just a node image for building front-end stuff, i'm now stuck with needing to install front-end tools into a build image which is meant for the back-end. Why? Because Ruby on Rails uses webpacker and builds the entire application in one go. Not only does this needlessly bloat my images, but also makes sure that i cannot deploy my front-end and back-end separately or even build them in parallel. It's about as cumbersome as frontend-maven-plugin, a mistake in my eyes about 90% of the time when i run into situations like that.

Then again, in the recent years i've leaned more and more towards having clearly separated front-end and back-end components with REST interfaces between them or something like that anyways. Sadly, this also means that i cannot use server side rendering that's built into things like Ruby on Rails, Laravel (for PHP) or even Django, as long as there is a need for tools like that, even in situations when a server side rendered application would be the best way to go and server side rendered Vue/React/Angular would be needlessly complex.

But as long as i don't run out of HDD space, i might as well pretend that the problem doesn't exist and just bite the bullet.

Roadblock #2: Docker registries universally suck

Despite me liking Docker and containers in general, i think that the absolute weakest piece of the puzzle is the state of Docker registries. In every single implementation that i've needed to use (bare registry, GitLab Registry, Harbor, Nexus) there have been problems with dangling garbage, bad cleanup approaches and just insane disk usage.

The images above only look decent because i did a full wipe of the blob stores and this time am having separate blob stores for dev/prod repositories. Before that, they were exceedingly bloated. I even created a task to specifically destroy everything within the underlying blob stores for the Docker registries:

cleanup policies in Nexus

Furthermore, i also actually manually triggered all of the overcomplicated tasks that need to be done so Nexus understands that it should actually clean things up:

Nexus tasks

Well, guess what - it didn't work. The actual blob stores remained as big as they initially were. So i decided to just go ahead and change the blob store directories that Nexus uses, e.g. ... /docker_dev to ... /docker_dev2, but then Nexus decided to break because it still had all of the old manifests and i could not push new ones, because it said that it already has them, when it did not. Curiously, moving the blob store location doesn't actually clean up the Nexus data. Nor does rebuilding the index. Nor does triggering all of the cleanup actions again, multiple times.

Instead, my only options were:

to manually delete every single manifest manually through the web UI (i don't know where they are stored on disk)
to delete the entire registry and then re-create it in Nexus, hoping that the config will be the same and i won't forget anything

In short, Nexus sucks. Docker registries suck. Absolutely disappointing experience, though still the only way to get what i needed working.

Roadblock #3: Time and effort

Lastly, as you can probably tell, all of that took a while and needed a certain familiarity with the technologies involved. It will admittedly also slow me down whenever i'll want to introduce a new piece of software or another stack, say Lua, which would necessitate more exploration.

For many people out there, it won't be worth it. Then again, i make about a quarter of the 100k USD that some of the developers in the US make, so using many of the cloud services out there simply isn't in the cards for me. Rich Americans (and other individuals from wealthy countries) may enjoy the privilege of using money to make these aspects of development someone else's problem, but best i can realistically do is buy a 40$ HDD and slap it in my old server that's made from refurbished consumer parts and use that for my personal needs.

I guess a lot of it is about everyone's individual circumstances. Personally, i'd draw the line at only proxying more complex pieces of software (the aforementioned databases and stuff like Nextcloud, Gitea, Drone, OpenProject, ...) because even though i might want to store them on my own servers, building and configuring everything on Ubuntu simply wouldn't be worth my time. That said, i might still prefer a common base, like what Bitnami sometimes offer, which would be a sane middle ground.

Summary

In summary, i am glad that i took the time to learn to get all of this working, whilst also ditching Alpine in favor of Ubuntu. I did enjoy feeling clever and crafty while using Alpine, but in practice it simply wasn't worth the time for whatever meager benefits it provided. I might still change this whole setup around sooner or later, but for now i should have a stable base to build upon, with most of the software development stacks that i'd like to use daily covered and done so in a way, where i can rebuild the containers that i need as many times as i want.

I think i've personally found a pretty nice middle ground and giving up and giving in to using Ubuntu LTS is actually a very liberating experience, much like using SFTP to copy PHP code to servers did back in the day, versus overcomplicated Jenkins pipelines to put oddly named artifacts and Helm charts in repositories that will then kick off an Eldritch horror of a process to get some Kubernetes container to install all of it and somehow still mess up integrating with Istio because the clients' infrastructure is still different from what we could get at work because we don't really have the resources to run the full Kubernetes and using cloud solutions isn't in the cards because of security concerns and... well, you get the idea.

Finding the middle ground is nice, even though what it is exactly might depend on your circumstances. At my dayjob, i might still prefer using Nexus, but with the official images for each tool, to limit my personal responsibility.

Update

I actually added the DB images that i wanted to the CI pipeline, now it suddenly is even more clear what i'm in control over and where i cut corners (using DB images prepared by others):

added DB images

Admittedly, i might still rethink whether it'd be worth it to do something similar for the actual prepackaged software like Nextcloud vs the stuff that i develop myself: since the former is usually just pinned to a particular server (like this blog, for example) and runs there with a stable version until i'm forced to update.