Tuesday, September 11, 2007

DNA databases: sending innocent people to prison

Have you ever come across the birthday party problem? The question is how many people do you need to bring together to have a better than 50% chance of two people sharing the same birthday? Clearly if two people match then their birthday parties are likely to conflict and problems may ensue. The probability of a given birthday is assumed to be even throughout the (non-leap) year (1:365). So how many people? The answer is that you only need 23 people (probability 0.5073).

Now imagine that you have a much longer year, say a billion days. What is the number required for a match? To be honest I do not know. I know the way to calculate it, but the numbers get very big very fast. Yes they then divide back down to small numbers, but calculating them is difficult.

What I do have though is a history of testing something similar. In my role as CTO of Clickstream we were dealing with unique IDs. These were created using a (pseudo) random number between +2bn and -2bn. This gives us a 'year' of 4 billion days. Not only that, but we gave each person another user ID calculated in quite a different manner as an independent check. We found that with a population of 100,000 we were getting on the order of 20-40 matches. In dna database matching a match is the equivalent of prosecuting an innocent person. Now this could have been because the numbers were not truly random, or it could have been the combinatorics as in the birthday party problem.

So why am I writing this blog and ranting? Because DNA testing is a bit like IDing.

The population is not truly random.
The probability of an individual match is (we are told) about 1:1bn
We will get false positives (though with DNA it could lead to innocents in prison)

It is important to bear in mind that the real reason that this matters is that a false positive may equal a wrongful conviction. With a much smaller sample we were seeing about 20-40 ‘wrongful convictions’. The government is now talking about creating a database that is big enough to put thousands of people in prison for no reason.

All of this is without taking into account the flaws in the way the system is run. We are assuming people do things right every time. The only time this has been audited (that I have found) the chances of a false match turned out to be 1:100 rather than 1:1bn. This was due to experimental error that we are assured cannot take place in real life, but I am not sure I trust these assurances.

A probability of 1:100 would mean that the bulk of prosecutions were actually of innocent people. If this doesn’t worry you it should, as there is no reason that you should not be one of the innocent once the drive to gather DNA every time we fly comes in. Forget about whether this is privacy intrusive, forget about whether it is moral, I wish someone would address the question of whether it works and makes sense.


