Debian and GRUB are broken

How would you feel if one day, suddenly, your server stopped responding and you could no longer reach it through SSH? Obviously, something like that wouldn't be too cool, especially if you'd have installed with the express purpose of serving as your backup machine. And yet, that's exactly what happened to me:

server went offline

It was a pretty regular install of Debian 10, with most of the software inside of it running isolated inside of Docker and only the regular unattended-upgrades for security patches and such:

unattended upgrades

And yet, it was offline. I knew that nothing's wrong with the hardware, because it's literally standing on the desk in my room, so i decided to lug my monitor over there, connect it up and have a look. This is what i saw:

boot error

So, basically GRUB had decided to randomly die, quite possibly after an unattended upgrade. So much for stable software, eh? After trying to boot into the OS by indicating the partition where GRUB is installed manually and that also failing, i realized that i'd just have to install Debian to a USB stick and boot into rescue mode.

One corrupted set of graphics and realizing that i need to manually pick a VGA mode due to AMD graphics being poorly supported, i was able to reinstall GRUB:

reinstalling GRUB

After that, it started working once again over here. Also, the only posts that i could find about the issue originally were literally from just a day ago on Reddit, where the person was helped by a similar solution:

reddit post

So, what did i learn here?

If you want your servers not to randomly break, NEVER update them. If you don't want to get hacked, ALWAYS update them. Oh, you want both? Nope, can't have that.

It's a bit satirical, but the more i use modern operating systems, i get the feeling that that's indeed true. I've almost never seen any software updates that do anything good - nowadays it's more often either changed functionality, weird UI and UX choices just to make it feel fresh, new bugs, more memory and CPU usage (like Wirth's Law) or just fatal failures that affect the entire server like this. Essentially, as much as the industry likes to go on and on about how you should always keep your software up to date, doing that will literally result in dumpster fires like this in the long run.

Ever seen the movie Snowpiercer? Without too many spoilers, people in the movie need to constantly spend time repairing their train because of it getting progressively worse as time goes on. Modern OSes and software both feel like that.