My thoughts while working on a Master's degree in 2020

Date: 2021-02-01

So, the other day i had just finished working on the first full draft of my Master's degree, which was when i got an e-mail from my supervisor which included the following bit of text:

"The contents of your Master's degree do not fully conform to the requirements laid out in the document in ORTUS [which is the RTU's information system]."

Now, that's a fair enough statement to make and, to their credit, they did forward some of the examples of what would be considered good, based on which i could refactor my draft. However, why not point out exactly what's wrong with it? Saying "it's all bad" would offer as much information, as the current suggestion of "something's bad in there, somewhere. Here are a couple of examples so you can figure it out yourself", which increases the chances of me coming to the wrong conclusions.

Disclaimer: the professors in the university are probably a tad overworked and, given the ongoing pandemic, it's understandable that they don't have enough time to give detailed advice to everyone. But right now i'd like to talk about some of the things that i noticed about the overall process.

What's considered good?

I did skim the differences between my text and the reference ones provided and i do have to say that i found myself a bit unimpressed with what i was seeing. The topics discussed in the other Master's theses were actually pretty okay, as was the way they were developed, the technologies used and most of the stuff that you'd actually care about when working on a system in real world circumstances. Instead, it appeared that the only major difference was the formatting and the layout.

Now, i can't show you the actual contents because GDPR, IP laws and a bunch of other legalese makes the particulars around the idea of doing that kind of murky. However, i can say that most of them followed approximately the following structure:

Abstract
Introduction
Analysis of the situation
Requirements for the piece of software
-- Functional requirements
-- Non-functional requirements
Description of the software design
-- Decomposition of the modules and data
-- User interfaces
System implementation
Testing of the software
Deployment of the software
Conclusions

That was quite different from how mine looked, mostly because i developed 3 separate systems:

a CLI tool for managing infrastructure and preparing it for running containerized workloads
a web-based app to demonstrate some of the cloud-native development principles and to run on said infrastructure (it was a conceptual app to track COVID cases based on GPS data)
load tests to check how well each of the orchestrators that i chose actually deal with load

If i would have chosen to develop just a single app, then sure, i could have easily organized my work to conform to the above structure, but now it was more difficult. It actually made me think back to how many people structure their software projects in a bunch of folders that attempt to classify everything within a project based on its type:

views
templates
controllers
services
repositories
configuration
migrations

Actually, here, have an example of a project just like that, of the Ruby app that i used for describing containerization and load testing:

This approach is oftentimes used by a large number of tools that generate project scaffoldings, but the thing is, i'd argue that it can actually make development harder! Suppose that i want to develop a new module for a system that i'm working on, say, for processing client orders in an e-commerce setting. Inevitably the structure of my solution would introduce new modules/packages in this folder structure and it would end up looking like this:

views
-- orders
-- (a list of other stuff goes here)
templates
-- orders
-- (a list of other stuff goes here)
controllers
-- orders
-- (a list of other stuff goes here)
services
-- orders
-- (a list of other stuff goes here)
repositories
-- orders
-- (a list of other stuff goes here)
configuration
-- orders
-- (a list of other stuff goes here)
migrations
-- orders
-- (a list of other stuff goes here)

And don't try to tell me that you can just forgo the step of making said packages in any reasonably sized project, because then you'd end up with a directory that contains hundreds if not thousands of services and you'd have a hard time finding things that pertain to a specific domain or functionality that you need! This particular layout, both in software projects and written pieces of text is good for mostly one thing: grouping things. If you want to see "all of X that belong to Y", then it's pretty good. Want to see all of the bits in your system that display HTML to users? Look under "views". Want to grade the student on how good they are at writing system requirements? See the chapter titled "Requirements".

Is it really good, though?

Now, is that easy to work with? I'm not quite sure. In the case of code, it is easy only if you have a helpful IDE with filename/class name completion that offers you possibilities based on what you've typed in and even then it only works if you have any idea of what you're looking for. For example, different modules could have a "UserController", but one could be meant for the sales domain, while another might be concerned with the orders domain. Putting them in the same "controller" folder would require subfolders, which would just make them be even more far away from the code that they'd interact with. It feels like nowadays it's no longer possible to easily navigate codebases without ctrl+clicking all around them.

Instead, why not just have the whole codebase as a bunch of modules, each of which can contain all of what they need? So, why not the following alternative:

orders
-- views
-- templates
-- controllers
-- services
-- repositories
-- configuration
-- migrations
(a list of other modules goes here)

That way it'd immediately be clear when you're breaking some boundaries, for example, instead of orders.services.UserService you'd get billing.services.UserService. It'd also make it really easy to reason about what you're working with inside of the limited context boundaries of your module, instead of the multi-million SLoC monster that probably breaks the more fragile limits above regardless. Now, one could say that my suggestion makes code reuse a pain, since all of the sudden you see all of the stuff that doesn't belong in any particular spot, but i think that's a good thing - once you'd have to deploy your "orders" and "billing" micro services separately, all of the sudden you'd be forced to address this problem instead of just sweeping it under the rug with the first approach and pretending it doesn't exist. I like to think of it as a canary that we as humans suck at organizing things.

Contexts and where to find them

And, to return to the problem at hand, almost all of the example works that i was given followed the first pattern. "All of the requirements go here. All of the interface descriptions go here. All of the testing bits go here." It didn't feel entirely comfortable, because it reminded me of waterfall! That's not to say that waterfall can't be a good thing, especially if the requirements are known ahead of time and are explained to you well (as opposed to someone giving you a very long PDF), yet it begets a particular way of thinking that i find problematic. It's the very best example of the fallacious assumption that you can describe the entirety of a system ahead of time and organize all of its pieces based on what they concern. And even if it's somewhat possible, it's probably not a good idea - the human brain has a limited working memory!

Suppose you have to implement such a system, you get a 100 page spec and are sent on your merry way. So what do you do? You probably try to read through it, page by page, of course! Well, in that case you'll quickly find that you just can't do it as well as you'd like. It's possible that the repetition of 20 pages of basically the same kind of table will get to you, the small differences in data between them eventually blending together and diluting the domain knowledge held within into formless mush. It's also possible that you'll discover that the "modules" of the system don't really fit together, because they're literally torn apart and decomposed into parts. One could argue that it's good for when you have a clearly documented front-end and back-end, both of which will be developed by different teams (surely not a recipe for integration problems down the line! /sarcasm), but it's definitely not okay when your workflow will become: "Read a chapter, flip through 20 pages to find the next bit where this module is described, read a chapter, flip through yet another 20 pages to find the continuation".

So how does something like that translate from the seemingly unreadable system specifications that i've seen in projects at work and have had the displeasure of slogging my way through? They fail to provide context for the knowledge contained within, because it's lacking any sense of spatial coherency within the boundaries of a particular bit of functionality. In short, in my master's degree, i wanted to tell a story, instead of laying out the sections with clinical precision. And yet, that's not what was expected of me.

The "waterfall" way of thinking

In real life, systems aren't implemented in the way described in any of those specifications above. Noone goes to work and is like "okay, now i'll create the DB schema for all of the modules within the system and then, after this is done approximately 2 months later, i'll start writing all of the tests". Sure, you can go schema-first (which is a good idea because it forces you to think about the data structure) and you can also use TDD, but in 90% of the cases you won't handle a particular bit of functionality for the entire system in one go. You'd most likely integrate smaller parts of it, bit by bit. That is also how i oftentimes familiarize myself with new systems - instead of figuring out how the DB layer works from A to Z, why not see how a request is processed within the system, probably hitting multiple layers of abstraction?

My text was a bit different than the examples, in that it described each of the applications separately. Not only that, but it started each chapter with a bit of context about what's described in the chapter, and also why. Why did i choose to write a CLI app instead of a GUI one? What are the things that cause problems when creating CLI apps? How can they end up providing a bad developer experience (DX)? How did i alleviate these issues: what programming languages and frameworks did i use and why? It starts out with an overview, then goes into details about the specifics and at the end of the chapter it attempts to sum up what i've found and what the results were. It's like reading a series of small articles on one topic, that all come together, much like study courses on YouTube would, or the chapters within a maths textbook that introduce you to new topics, just condensed down to 133 pages. Of course, it doesn't always do that, but that's just because i suck at writing.

I'd say that there is enough variety in how people write that any attempt to standardize everything will either fail because of said differences, or result in extremely watered down, unreadable and unengaging pieces of text. Pieces of text should make you want to read more. They should help you and provide a straight learning curve. They should congratulate you on finishing every chapter and make you feel like you've learnt something that you'll be able to use, without having to read the rest 110 pages and maybe even re-read them a few times, to even start to see how it all fits together.

Also, putting 50 pages of tables of inputs & outputs in your text (like someone actually did) makes unsuitable for human consumption. There's a good chance that you're writing your texts for other human beings to read, not robots. And if you need to get down and dirty with the low level details, descriptions of the inputs & outputs to your functionality belong in automatically generated docs that your platform of choice will output (for example, you can use JavaDoc for Java and output that to HTML, phpDocumentor for PHP, even things like Swagger for documenting REST endpoints), instead of the stuff that you'd present to your end users.

We could really use some more examples and templates

Of course, there were other things that begged improvement. When describing what you actually need to do, you're given a PDF with demands. But where are the examples of what the expected end results are? They're not published in the ORTUS information system, that's for sure. They're also not published in the system that allows you to search for theses either. You can get an abstract and maybe some other bits of information that is overall unhelpful. So for whom are we even writing these things, if noone will ever even read them outside of the university? If not because of a lack of interest, then definitely because they're not available!

I didn't take the easy way out and describe some stuff that i'd develop at work, since then one could make the argument that it's IP and that the clients or the company wouldn't want to see it published. Instead, i took the approach of solving common issues and demonstrated them with completely bespoke tools. Do you mean to tell me that the world doesn't deserve to see the results of my sub-par toiling that still might be useful to someone? Are people too shy to expose how bad their stuff is to the world? I'm not, you can actually find the full texts below. I don't understand what's the point of this "science" if noone benefits from it?

In that system, there are supposedly 23'051 pieces of text that people have poured countless thousands of man-hours into, over a decade, nonetheless! Do you really mean to tell me it was all for nothing? All of those should be available publically, they should be indexed and all of them should be up for discussion, same as how scientific journals should be organized! All of the code should be up on GitHub or GitLab and all of it should be launchable with docker-compose up, lest all of that effort be wasted!

Office suites are problematic

Furthermore, the demands that they have for you describe how to make a text document. It's unfortunate that they expect you to use "Times New Roman" (why not just specify that the font must be metrically compatible, like Liberation Sans), but when i was working on my bachelor's, they also expected me to use Microsoft Office to hand it in. Of course, it's no secret that most of the developed world runs on that software... Which feels like it'll make digital archaeology much harder in 50 to 100 years! The very data format and the closed source nature of Office makes it a bad choice for storing any sort of data that you care about. What about people who prefer to use open formats and instead choose to use LibreOffice? What about people who use AbiWord, OpenOffice, WPS Office, Calligra, WordPerfect or any other office suite? Maybe all of the sources should just be plain text and all of the formatting should be applied later, like HTML and CSS co-exist. Maybe even PDF would be a more apt choice (or DjVu, or epub, or even SVG).

But i could get over that, because most of the office suites have a passable enough support for what they want to do... Except they don't! And they're not even sane enough to accept using the defaults that office provides for you! Ever manually gone through 60+ images and individually changed the captions of every single one because the default format wasn't good enough? I have, it was not fun. Ever spent an hour setting up the page margins to be just how they want them to be, just for them to still look weird because the actual text is laid out differently due to a variety of factors that are explained by the specifics of the word processors that noone really cares about? Even the features that should have made our lives easier oftentimes don't work as you'd expect and do the exact opposite thing! I feel like our lives could be easier if these word processors would be programmable already! Just give me an easy to use API so that i can select all of the images within the page that match certain conditions (like XPath) and apply structural changes to them, not just things that pertain to the styling.

Yet most of those issues could be sidestepped entirely. All they need to do is provide a simple document template. Fill out your name and surname. Fill out the topic. Replace the "lorem ipsum" in the annotation with your stuff. Use the already created styles that are available to you in the palette to format bits of your text in the consistent and approved format. Sure, it won't be 100% accurate because most people don't exactly know how to use word processors. It won't be 100% accurate because this .doc would be interpreted in different ways by MS Office, LibreOffice and other pieces of software. But it would save thousands of man-hours even in this small university!

Sometimes you just need uninteresting solutions

Imagine how much it would suck if you couldn't find already formatted templates of all the documents you might need in life - things you need to fill out concerning your healthcare and taxes, documents for registering your own company, or to form work contracts with others, to lease out a property or to rent it from someone, to sell expensive belongings to someone, or to authorize someone to take out your postal packages for you. Imagine how much life would suck if you couldn't go online to a system which allows you to look for forms and contract templates to fill out, even online, or just download and do it yourself. Imagine how much your life would suck if, instead of being able to electronically sign those documents and e-mail then, you'd have to print them out, sign them manually and bring them to the institution in person. Oh, wait... In many places around the world, that's still oftentimes the case!

Personally, my hopes for the 21st century are that eventually people will come up with a universal way to represent text documents. It probably won't be plain text because that's too boring and apparently too limiting for people who like pretty layouts, hopefully it'll be something more open than PDF and ideally we'll get something like a more user friendly version of LaTeX (think what Python is to Java) that'll let us describe all of our bills, leases, appointments, contracts and any number of other documents. Heck, i'd hope that a number of exceedingly boring yet usable solutions, an entire ecosystem, would develop around this format - things that would let you interact with these documents and modify them as necessary.

Yet, until then, we're stuck with our current word processors. They feel inconsistent. They feel bloated. They feel exceedingly vendor lock'y. They do more than they should (ActiveX or OLE, anyone?) and less than they should (decent APIs?) at the same time. I do like LibreOffice when it actually works and i feel like we're close, but clearly there is a difference between a freeform document that can be anything and the typical sorts of semi-structured forms that you'd expect to fill out.

Conclusion

In summary, i'm salty about how we don't seem to be writing many things for actual human beings to utilize and how much work we end up doing for nothing. If people cared more, the world would be better, but many just chalk these things up to the way things are and don't even give it a second thought. Can i change all of these things? Not a chance. But at least i can call them out and vent my frustrations into the void. To attempt to pinpoint some of these things, so 40 years down the line when things have been marginally improved, someone can look back at this and say: "Huh, this person was wrong about 90% of things in the article, but this one thing is really spot on!"

You can read the original draft of my Master's degree here: RTU maģistra darbs v1.pdf

You can also read the edited version which better conforms to the requirements here: RTU maģistra darbs v5.pdf

Which of those is more readable? Which would you like to see in the wild, to get to know a system that you'll have to work with? The answer here is probably "neither", because 100+ pages doesn't always make for an exceedingly interesting reading material, at least when we're talking about dry technical topics, instead of fiction.