COVID-19 contact tracing with Bluetooth vs GPS tracking and the unexplored potential for these apps

Date: 2021-02-08

Disclaimer: I was 1 of the approximately 100 specialists who participated in the Apturi Covid project (i created the homepage), however the views expressed in this article are my own. The article itself isn't an attempt to critique the existing apps, but perhaps provide some commentary for the greater picture.

One late autumn evening i was sitting in my dorm room and pondering whether i should finally get around to creating my own blog and, perhaps more importantly, what i'd even write about in it. That's when i got an alert on my phone and the topic for what would become the first post essentially fell right into my lap:

Oh noes! It appeared that i was in close contact with someone who had contracted COVID-19.

The actual information presented to me by the app seemed pretty straightforward and reasonable:

stay indoors and limit how much you're around other people
look out for symptoms of the virus
if you notice any, contact your doctor
if you have serious symptoms, contact emergency services

And yet, it made me really upset. No, not just because of the fact that i could have possibly contracted the virus, but also because this wasn't the information that i wanted to know at the moment.

The problem

In books, you sometimes see the expression: "And in that moment, the character's entire life seemed to play out in front of them." In my case, i thought back to the past few weeks and how many times i had been outside during that time. In the case of a hermit programmer, this thankfully wasn't a high number:

i had gone to a work stand-up meeting in person (which was later moved to Skype due to concerning increase in cases in my country)
i had gone to the postal office to pick up a replacement charging adapter for my laptop (which they apparently couldn't just deliver to my place, despite it not being much bigger than a letter)
i had gone to the shop once or twice, to get additional food and various personal hygiene implements
i had gone home with my father in his car, to get out of the city for the weekend
i had gone back to the city taking a bus and a tram, to be able to focus on my Master's studies and work (no offense to my parents, but i need my privacy and peace)
i had also given in to peer pressure and visited a convention with some of my friends, wearing a mask most of the time and keeping my distance from people as best as i could

The question i wanted answered was: "When and where did things go wrong?" One might argue that the app isn't intended to give me that information and that the very alert that it had presented me with essentially meant that it had done its job successfully.

That, however, isn't enough in my opinion. You see, at the time of me writing this, the app has been downloaded a bit less than 200 thousand times. Given that Latvia appears to have 1.9 million inhabitants at the moment, that means that only 1 in 10 people has the app. While even this number is helpful to SPKC (The Centre for Disease Prevention and Control), that kind of leaves out the other 9 out of 10 people that i could come across.

For example, i suspect that my risky encounter could have taken place at the convention and it would be really nice for me to be able to warn the friends that i went to the event with - neither of which had the app installed, due to both the age of their devices and the lack of free space on them. And yet, while i can suggest that they should be more careful in the near future until either any symptoms manifest or a few weeks pass (which i'm doing myself right now), i can't be sure of this with any degree of certainty - for example, i might as well have just stood next to someone in the tram who had it.

To be able to help out the people around me, i would need more data.

Digging deeper

The sad part is that even the Exposure Notification API that Google developed doesn't give you much more information. For example, if i go into the settings under Settings > Google > COVID-19 Exposure Notifications, i will see when my device has checked for keys that have been published, to check against the ones of the devices that i've been in contact with:

Looking around a bit more, i can see when the key was published:

And that's it! It doesn't tell me when my device was in contact with that particular one (which it should be able to do, given that it stores its key locally), therefore i cannot correlate that time with my location. This means that i can't even get any idea about how long i should wait before i could expect to develop symptoms, or who i should warn based on my own past activities.

Would my parents be at risk? Would my friends be at risk? Due to way this is implemented, i have no idea.

The alternatives

While this shortcoming is by design, there are a few alternatives that could have been used when considering how to implement a similar solution:

give users access to these random keys and give them the data about when they were saved
additionally use GPS for tracking where the infected people have gone recently

I don't really have any ideas about why the first approach wasn't implemented, because it could help in identifying how the virus spreads, though in this post i'll focus more on the second approach because it seems rather interesting. As a matter of fact, certain other countries have implemented solutions which do use GPS: Iceland, Italy, Jordan and possibly some others (given that the "Protocol" column isn't entirely filled out on that Wikipedia page). And yet, Apple and Google have banned the use of location data directly:

Apple Inc and Alphabet Inc’s Google on Monday said they would ban the use of location tracking in apps that use a new contact tracing system the two are building to help slow the spread of the novel coronavirus.

And the solutions that do use location data have come under fire because of privacy concerns as well. And yet, is it a valid criticism? Do the privacy benefits outweigh those provided by not shying away from GPS tracking and going beyond just tracing? I doubt anyone has a good answer to that, but let's explore the technical aspects regardless!

Advocating for more data

In medical sciences, there exists the concept of herd immunity:

Herd immunity ... is a form of indirect protection from infectious disease that occurs when a sufficient percentage of a population has become immune to an infection, whether through vaccination or previous infections, thereby reducing the likelihood of infection for individuals who lack immunity.

Why am i mentioning this? Because using the app could function in a similar way. Why? Let me explain, with video games!

There is this game, Pathologic 2, in which you're a doctor who's in a city that slowly succumbs to a plague in some rather Lovecraftian circumstances. While this isn't a game review, i do suggest that you check it out sometime, because it's rather artistic and has some pretty interesting storytelling. But within the game, at the start of every day, your map is updated with points of interest, as well as the districts of the city, which have become infected with the plague:

Now, let's contrast that with a map which details the count of COVID-19 cases per region in the country:

If you're faced with making the choice between going to the countryside for a few months or staying in the city which has ~10x more cases of infected people, which would you choose? Right about now most people would lean towards choosing the former option, it seeming more logical at the first glance. And yet the realities of our lives aren't always as simple!

For example, what if you worked in the city? There are still professions which can't be done remotely and therefore that might also become a choice between putting food on the table and unemployment. Consider that around 75% of the people in this country don't have savings to last them more than 3 months, which can be attributed to the sad state of the economy in general. But even if money wasn't an issue, would those two environments be comparable in other ways? What if you couldn't concentrate as well in one of them, something that people who attempt working remotely with their families around surely have discovered to be a factor that also needs to be considered.

The correct answer here is the same, as for many things in life: "It depends."

A lot of what humans do is based on implicit cost-benefit analysis and taking measured risks, without which most of the largest companies in the world wouldn't exist today and many technological advancements simply would not have happened. And given that taking risks is almost unavoidable even in daily lives, it is helpful to minimize them through utilizing whatever means are available. And since we live in the "Information Age", i'm sure you can guess what i'm hinting at. In this particular case, knowing the total count of cases in the city doesn't help people make educated decisions, since the distribution of cases most likely isn't uniform, for example, if we look at what recently happened in a church in Germany. If i had a similar hotspot nearby, i'd probably be able to easily decide that i should move to the countryside despite any negatives and vice versa.

We already have the technology

So if data about the whole city is already presented, why not do it at a more granular level? Why not do it per city block? Even the OpenStreetMap project provides high quality maps with all of this data already included, a dataset that's completely free to download and use. It can even be imported in geospatial databases, such as PostGIS, which can then be connected with something like leaflet.js for rendering the data and you'd have very usable maps for displaying and processing this sort of data for the grand total of 0$ (hosting costs not included).

I even got halfway there with a local instance a while ago:

(in this case, i used QGIS for visualization, where i'm highlighting a part of the city)

At that point, you can even develop your own solutions around it, be it for visualizations or data analysis:

So, we already have the technology to do this with little to no issues along the way (the possible ones being related mostly to scaling, something that horizontal scaling can largely solve). So why isn't data like this available?

Privacy, in a word. More granular data would require historical location data (which isn't collected, thanks to Google and Apple) or data about where the cases were discovered, where the affected people live, work and have commuted, which, even if collected by the employees of SPKC in some capacity, isn't directly published either. If Google and Apple are helplessly throwing their hands up in the air and claiming that they can't find a good middle ground and are banning usage of GPS outright, that means that it's probably a problem that would be pretty difficult to solve and really easy to screw up. All of the sudden, i'm reminded of a saying about IT security:

"If people who do it for a living routinely fail at it, what makes you think that you'll do better?"

Some possible approaches to using GPS data

So, let's ignore common sense and rush steadfast into compromising everyone's privacy, anyways! We're dealing with hypotheticals, after all.

The simplest approach would be just to store the location data for all of the devices locally. Then, upon discovering infection, said data could be published for other devices to check their historical proximity against and for other visualizations to be made. In its simplest form, we could think of it all like a number of dots on the map, each with a timestamp:

So why is this a privacy nightmare? If all you care is about which parts of the city to avoid, then it shouldn't matter. And yet, people are exceedingly likely to have other motivations in that regard. With this sort of data, it's very likely that one could figure out where a person works, where they live, what their daily routine is and even the stores that they visit for grocery shopping. Would that be helpful in alerting others in these locations that could have been exposed? Sure. But currently that information can be collected by interviewing patients in private about their past activity, by the employees of SPKC.

The thing is, that once you have all of the data, you have a pretty good chance of figuring out who the infected person is. Then, there's a chance that the way others treat them could be affected - we can't really say what sorts of prejudice and what sort of treatment they'd experience. But what if said person is someone who works with sensitive data, perhaps a politician or just someone else that bad actors would become interested in? Perhaps you figure out that the person in question works in corporate sales and has visited certain companies that now you know their employer is talking with, information that wouldn't necessarily be published at that time otherwise? Frankly, there could be a variety of far reaching consequences to location data being public, most of which we simply can't figure out.

Okay, suppose that instead of publishing the data directly, we figure that we'll simply allow each device to check when and where it has been in contact with another, the owner of which is considered infected. For example, if we draw an additional path (in green) and want to check which bits of the path could be considered "dangerous" for the risk of infection, we could return only the intersections of the two paths:

The problem is, that we still return the direct data, now it's just a little bit harder to get it. Essentially it becomes a question of how many emulators you can run in parallel to find out the full dataset. And even if you limit each device to accessing the data only once per day, you still cannot effectively rate limit the requests, since it will probably be pretty hard to separate the legitimate requests from the illegitimate ones, especially because of the possibility of using VPNs. And if you decide that each installation needs to be verified, now you have an entirely different set of data being collected about your users that's a different can of worms. This approach would also quickly be compromised:

And even if you would fuzz this data, the overall patterns would still easily become apparent - and if you go too far in applying random values to the data, you'd quickly end up with maps that are pretty useless.

So what alternatives are there? The simplest one that i can think of would be to keep using Bluetooth for direct proximity detection, however also use GPS for generating publically accessible maps which would not contain information about individuals. We could generate heat maps of the infected population, which would still necessitate uploading location data, but in this case the accuracy of the data would be less important. As for displaying such data, there is actually a variety of different plugins for leaflet.js, some of which offer the possibility to easily display heat maps based on such data, given that this sort of visualization is nothing new!

That could be a good compromise to indicate which bits of the city to avoid in the near future, or to be extra careful if you've visited them in the near past. For example, if we actually had the data we could create something a bit like the following (of course, this one was created by randomly painting data over the map):

Now, if i was planning on visiting a shop in the west part of the map (to the left), do you think that i wouldn't look for a different one when confronted by such a heat map? Wouldn't i call up my friends and be like: "Hey, since we all went to that very same store a few days ago, you might really want to stay inside for a couple of days, just in case."

Is GPS worth it

It feels like it'd be really easy to mess up handling location data and have it leak with negative consequences. Perhaps that's one of the aspects for why a Bluetooth based approach as used, as well as the fact that measuring the Bluetooth signal strength is far more accurate than attempting to correlate location and time series data. That being said, considering that 9 out of 10 people currently won't benefit from the app in any way, it would still be useful to have public maps of city blocks to aid in decisionmaking for individuals.

But how would people react to an application that needs to access their location data? What if they simply wouldn't want to use such an application? The answer is actually really simple - you let each user choose separately. Maybe enable both methods by default and then allow users to opt-out of specific functionality, as they see fit:

Now perhaps one of the possible downsides of such maps being generated is the amount of people that would become really concerned because of living in the affected areas. If the hospitals are already overwhelmed with testing and attempting to prioritize who should be tested, who do you think would deal with 10x more people who may or may not have had some exposure to the virus, but are now worried because of a heat map that represents general trends?

If this doesn't seem like such a huge problem initially, what would you do when the aforementioned bad agents decide that instead of just attempting to access location data for individuals, they'd instead like to spoof GPS locations and input false data into the system? This is another disadvantage to a GPS based approach, since all of the sudden it becomes pretty hard for you to verify the data - of course, you could technically limit the amount of entries that a single device can generate per day and also attempt to check whether these records are plausible so that you don't end up with something like this:

But at that point, all of the sudden you also need to think about validating this data and doing that at a notable scale of hundreds of thousands of devices. Now, at least having the option to use GPS as an additional mechanism of generating helpful data seems pretty useful, especially in the wake of the fact that many devices simply don't support the vendor specific API needed for BLE based contact tracing. In these cases, at least the owners of said devices could use GPS to later aid SPKC specialists in discovering their past whereabouts, perhaps this data being validated manually before it will be part of any larger or public data set, much like a verification code is necessary to publish a set of keys as belonging to someone infected.

But even if the idea is good, there is a certain complexity and quite a few risks associated with it. If the current Bluetooth based approach has already been implemented (with varying success), wouldn't tacking on more features be just a case of scope creep? Wouldn't this dramatically increase the manpower needed to pull this project off and do it well? It seems like a difficult problem to handle, one of those "Damned if you do, damned if you don't," situations.

A world gone mad

And yet, it all seems a bit hypocritical in the face of the fact that Google (and possibly Apple, though i can't say for sure) already collect this data! For example, if you're logged into your Google account, feel free to visit the location timeline and you might just discover that Google has been storing quite the amount of data about when and where you've been. Feel free to have a look at the day where i went to the stand-up meeting at work and later dropped by the postal office:

Looking at this data, i don't get the feeling that Apple or Google have my privacy in mind. I get the feeling that they have it in mind only when they can't guarantee that the data wouldn't be handled improperly and wouldn't lead to any bad PR due to breaches and such. Whereas, when they've spent years architecting the perfect system to gather this data about their users, they feel more than willing to treat themselves to as much of it as they can. Even them introducing limits to how long they'll store it is a bit of recent news and has perhaps been brought on by the data storage space requirements rather than anything else.

But why do i have it on, then? Because i rather enjoy them giving me helpful updates on the traffic and public transportation when i want to go to work. Oh, and also because at this point i don't even care - the battle over privacy is one that we've lost a long time ago, with most of the software and platforms these days feeling rotten to the core. A good example of this is cookie prompts abusing the path of least resistance, ensuring that users will opt-in due to the sheer annoyance of having to navigate weird nested sub-dialogs for what could have been expressed in a simple "No, don't track me" button, most also choosing to ignore the "Do Not Track" browser option:

Of course, this isn't the best place for me to rant about things like the built-in browser app on my phone refusing to work because i won't give it permissions to take pictures and videos, access my location, record audio, access my files and media, as well as dig around my contacts:

But the gist of it is that going full Stallman is pretty unreasonable and so we're all screwed, because most of us willingly give up what little privacy we have left anyways. Except for when we as a society could benefit from it, which is the perfect time for everyone to remember that their privacy matters and to shout as loudly as they can against any form of tracing (not even tracking):

You don't have my consent to use my phone number, e-mail, name or any other personal information concerning the Apturi Covid app, to identify or track me, or to get the contacts of me, my family or my friends and acquaintances. I understand that all of your contacts, as well as the contacts of your contacts will become known to the owners of Apturi Covid immediately after it is installed on your phones. Seeing what's going on in China with the usage of a similar app, i refuse to become a direct or indirect user of the Apturi Covid app. I do not want the government or any other organization to track me for any reason whatsoever. It is against my rights as a human being. I will not use the Apturi Covid app and i will protect my contacts. Nobody has any right to track me without my consent.

That particular bit of having no idea what "contact" means in this context was posted on Facebook, of all places. You know, the site that has tracking cookies and a plethora of other invasive mechanisms for keeping track of its users all across the web. Of course, having no idea how the Android permission model works doesn't change the fact that this particular person decided to voice their opinions anyways, regardless of how far from reality they are.

Another discouraging thing is seeing the results of polls about how the populace feels about masks in the evening news. One such programme, "Bez Tabu", has published quite a few of those.

Here's one with 2414 participants (at at the time of writing thing), that asks the question: "How do you feel about the requirement to wear masks in public places?"

The answers are as follows:

36% - i'm in support and wear a mask myself
29% - i'm against but still wear a mask myself
35% - i'm against and don't wear a mask

Some time later, after some action by the government, another one with 2135 participants (at the time of writing this) was published, that poses the question: "Do you support the idea of fines for people who don't wear masks?"

The answers are as follows:

39% - i'm in support, we should have been doing this already
11% - i believe that masks are necessary, but there's no need for fines
50% - i'm against wearing masks

Similar sentiments could be noticed in other polls on the site that were concerned with the Apturi Covid app and limitations imposed by the government. Does this mean that the people value liberty over not spreading a virus that will kill some of them? Or perhaps their views have been shaped by the media that they consume and the chaos that was created by the lack of clear direction in how this pandemic has been handled so far? Or maybe it's just a few trolls who have nothing better to do than hook together some VPNs with some macro tools for the lulz? Frankly, with a sample size this small, it's hard to say, but seeing percentages like that is definitely not encouraging!

Either way, the sheer incompetence of both presidents, governments, individuals and organizations alike over these past months has made me lose some of my faith in society. Frankly, even if we decided to use GPS as an opt-in mechanism with plenty of thought put into keeping the data as safe as possible, people would still find a way to mess it up. After witnessing the president of one of the largest countries on the Earth suggest that people should consider bleach and light, i'm not sure what to even think.

At this rate, Idiocracy will be viewed as a prophetic movie.

Update

Seems like i didn't end up with COVID, though as a precaution i moved to my home in the countryside for the time being. If i'm in contact with basically noone, that lessens my risk of exposure, though that doesn't seem viable for many people in this society! Barbers, for example, might go out business.

Update #2

I've actually decided to use the proposed system in this post as one of the components of my master's degree thesis. Implementing such a solution would not only let me have a rather nice example of load testing, but also allow me to end up with an example of how such a system may or may not work. They were actually talking in the news about how SPKC can no longer track all of the sources of infection, given that interviewing around a 1000 people daily is a difficult undertaking, which would actually serve as a nice example of the utility of such systems:

(the resolution here is pretty limited due to the limited hardware resources i had for testing, but the benefits of something like that should be obvious)