Ansible and Docker: a useful match made in hell (with video)

Date: 2023-03-07

I am currently working on a blog post where I intend to compare the real world performance of the Nginx and Apache2 web servers in similar circumstances. As a part of that, I also decided to setup Ansible for managing the nodes in question, because I'll need that very same kind of setup for my DevProject 2022/3 series of software development videos later as well. The idea is to embrace the principles of GitOps and use a Git repository as the single source of truth - even for what should be on the servers running my applications.

By now, I've actually done that, but as the title of this post might imply, it was quite the journey! For a change, I actually recorded the whole process and you can have a look at it on YouTube:

But in case you want to read things instead of watching the full video, let me recap...

The setup

Now, the idea itself was pretty simple, as the architecture of Ansible uses at least some of the technologies that you're already familiar with: it uses SSH for establishing the connection and expects a Python interpreter on the remote system, which is a common setup. Furthermore, since the actual Ansible control node software doesn't take up too much space, it should be possible to launch it rather easily both locally, for testing, as well as on any CI nodes:

ansible use case

Well, the truth is that it isn't quite as simple. For starters, the Ansible control node software does not quite run on Windows (unless you use WSL), so Docker containers were the logical step forwards. However, even then I needed a way to feed in the SSH keys, the known_hosts file for remote server fingerprint verification, as well as hosts, ansible.cfg and eventually the Ansible playbooks containing the actual tasks to be executed on the remote nodes.

You see, I use bind mounts for passing in these files, because the tasks I want to execute against the remote notes (the playbooks) might change during development and I want to be able to test them without having to rebuild the entire container. The problem there is that only some of them can be mounted inside of a Docker container from a Windows environment, some of the exceptions would be the ones that need custom permissions, like SSH keys.

This is because the file system in use by Windows, NTFS quite simply doesn't support the POSIX file permissions needed and something like:

chmod -R 600 /root/.ssh

will quite literally do nothing inside of such a container, when the /root/.ssh directory has been mounted. This has previously been a pain point for me when working with PHP as a development language, as well as in other cases, but in practice means that we need to setup the whole container image based on these sorts of technological limitations, rather than in ways that would make more sense to us:

ansible requirements

Notably, the SSH keys are baked into the container that I had to build, because I could manage to use chmod only then. An alternative would have been to bind mount them in some temporary directory, copy them to the actual one during container startup and then fix the permissions afterwards, but thankfully the containers and images themselves are short lived and are disposed of after using them, so this isn't too big of a problem.

But that's not where the problems end, unfortunately.

Death by a thousand cuts

While the setup itself is great once it's actually running, getting there feels like death by a thousand cuts. For example, does the following look like an okay docker-entrypoint.sh script to you?

#!/bin/bash

echo "Fixing SSH permissions..."
chmod -R 600 /root/.ssh

echo "Startup finished, waiting..."
sleep infinity

Well, turns out that it wasn't, because the very first line was read wrong - due to files created in Windows having the wrong line endings by default, the resulting error messages also weren't very clear. What's more, is that when you're working with infrastructure code you'll never get those lovely red highlights that you would in your IDE for most programming languages, but instead will have to iteratively debug and catch every small problem along the way.

Just look at how many issues I ran into while trying to get this setup working:

troubles figuring out whether Ansible supports SSH key passphrases
the aforementioned issues with bind mount file permissions
useradd and adduser both exist, which is a bit confusing
SSH key permissions are wrong, however the error messages aren't always explicit (e.g. when run through Ansible)
you can't really bind mount an entrypoint script for a container
CMD and ENTRYPOINT instructions both exist, which is a bit confusing
Windows and Linux have different line endings, Git allows committing the correct ones to repo, but bind mounts happen while testing before then
known_hosts can contain duplicated entries, now hashes are used instead of server hostnames for security, but this isn't good for figuring out what you need
bind mounts can sometimes mask the directory contents, that would be there otherwise, like /etc/ansible might mask /etc/ansible/ansible.cfg

I wouldn't go as far as to suggest that this is somehow unreasonable, but you definitely need to keep quite a few things in mind when working with these technologies, to have decent success. This is especially important, because the causes for some of those issues won't always be immediately obvious. Thankfully, I was able to fix the ones that prevented me from proceeding, but only because I had worked with these things previously. Actually, clear error messages do go pretty far, regardless of the context!

Summary

Now, I have nothing against Docker or Ansible - personally I think that both of those technologies are great and I will be using them in the future as well. Both of them are tools that I have used in the past and will use in the future, as long as they make the overall development easier for me. After all, although neither of them are the newest or flashiest tools, both have lots of utility, if you know how to wield them, like hammer, or most other tools do:

old hammer

However, the disconnect between the file systems is something that I've also written about before and will probably mention in the future, most likely under ways how to avoid issues by thinking forwards. The elephant in the room, of course, is the fact that I should probably just switch over to Linux as my development OS as well, because it's just a slightly better suited platform for that sort of work.

However, Windows still has excellent software like MobaXTerm and switching over to Linux fully is probably still some ways away, both because not all of my favorite OBS plugins (some mic filters in particular, to let me deal better with issues regarding sound quality) are available in it yet and the stating of such pass-times as gaming is not all that great yet, even though it is getting better. I guess overall this just goes to show that even great software like Docker and Ansible has its pain points, which can be amplified under certain conditions.

Maybe I'll actually need to look into using WSL 2 with Docker again and moving over from the Hyper-V back end, that I have been using historically due to other reasons, such as the performance of WSL 2 actually being worse for me. But hey, thankfully it's not like there's a shortage of ways to look into improving things gradually, even before I move over to Linux. At that point, containers will probably be something I use primarily for consistency across different environments, as opposed to a way of running software that isn't natively supported on the host OS.