April 23, 2006

Connecting the Dots

Remember when the intelligence community of the United States was roundly criticized for not connecting the dots to anticipate 9/11? Read for example:

The criticism was that the pieces of information needed to anticipate and thwart the 9/11 attack were there, but the intelligence community failed to put them together. At least, that seems to have been the wisdom garnered after months of investigation and analysis of errors and omissions preceding 9/11. Now, the IC and others responsible for the security of you and me appear to have learnt their lesson. Wait a second, but they seem to have overcorrected. Read this incredible story of David Mery, reported first by Reuters (but currently not available at the site), then by Guardian, and picked up and editorialized by boingboing. I quote from the article:
They handcuff me (David Mery), hands behind my back, and take my rucksack out of my sight. They explain that this is for my safety, and that they are acting under the authority of the Terrorism Act. I am told that I am being stopped and searched because:
  • they found my behaviour suspicious from direct observation and then from watching me on the CCTV system;
  • I went into the station without looking at the police officers at the entrance or by the gates;
  • two other men entered the station at about the same time as me;
  • I am wearing a jacket "too warm for the season";
  • I am carrying a bulky rucksack, and kept my rucksack with me at all times;
  • I looked at people coming on the platform;
  • I played with my phone and then took a paper from inside my jacket.
...The police decided that wearing a rain jacket, carrying a rucksack with a laptop inside, looking down at the steps while going into a tube station and checking your phone for messages just ticked too many boxes on their checklist and makes you a terrorist suspect...

The London Metro Police just connected the dots (ticked boxes in their checklist in this case), arrested an innocent computer enthusiast, and put him (and his girl friend) through hell!

You see, connecting the datapoints, simple as it may sound, is not quite that simple. First, there is the problem of deciding which dots are relevant, from the infinite collection of dots that one can observe. Is the color of his hair relevant for classifying Mr. Mery as a terrorist? Should one include in the boxes on the checklist the way he smiles and the gait of his walk? In the field of statistical pattern recognition, this step is known as attribute selection. Attribute selection is more a black art than science, where the notoriously unpredictable human behavior meets mathematics. Why not select every conceivable attribute? It's expensive, both monetarily and statistically, that's why. Collecting and processing the dots cost time and money and irrelevant dots cause statistical nightmares.

Once we have identified and collected the dots, the next step is to connect them. Early in my statistics courses, I learnt that there are infinite lines (technically, hyperplanes) that will connect the dots. You really don't need fancy statistics to tell you this, even simple geometry will do. Choosing the correct line from these infinite lines is akin to finding a needle in a haystack, probably worse. The exercise is error prone, and the best we can do is to minimize the error.

Finally, we need to cross the tees and draw the conclusions. Because the entire exercise is error prone, we should be careful in interpreting the findings and acting on them. Discretion must necessarily play a big part here, and one should allow for the possibility of egregious interpretation and action, with or without malicious intent. Discretion should only be given to those with the experience, wisdom, and humility to avoid even inadvertent transgressions on the fundamental liberties of individuals upon which so much of a civilized society rests.

If you had connected the dots in this journal by now, you would see where it's headed. Whatever the politicians might say, counter terrorism is complex and for the experts trained in psychology, statistics, and pattern recognition. The London Metro Police trained to keep law and order. To blame them for the David Mery fiasco is...well, almost like blaming the messenger. They were just following the instructions given to them. Someone gave them wrong instructions.

  1. Noticed your link. (I am not familiar with attribute selection.)

    > Collecting and processing the dots cost time and money and irrelevant dots cause statistical nightmares.

    Not just identifying the relevant dots but anticipating which dots will be relevant is a cause of what may become a statistical nightmare. An old and fascinating article covers this in a very powerful way: Computer system reliability and nuclear war by Alan Borning, a copy can be found here.

    As to your conclusion, re 'almost like blaming the messenger', it is shared responsibility. The decision to arrest me was taken by a Police officer (in the context of the laws voted by politicians). 'They were just following the instructions given to them' is a too common answer that is not an excuse. See some posts supposedly by a police officers as to the degree of freedom, independence and responsibility they have in the creation and application of the law here and here.

  2. Mr. Mery, I was pleasantly surprised by your comment. I didn't expect such a quick response from the subject of this story himself. Thank you.

    I intended my article to be a general alert to the pitfalls in mindless application of results from statistical analysis. I just went by the news story which seemed to suggest that the London Metropolitan Police were acting by the "book" in arresting you. I am not aware of the discretionary powers vested in them. You were the victim and you obviously know better. I stand corrected.

  3. "Attribute selection" is also commonly known as "feature selection." A typical example is the problem of face recognition - the features could be the eyes, nose, mouth, etc., or more abstract concepts such as color histograms and spectral characteristics.

    There are methods to whittle down the attributes to the relevant set pretty easily. The hard part is coming up with the original set and, as mentioned, collecting all of the data.

    Regardless of whether the police were following someone else's rules or following their own "gut instinct", no system is perfect at pattern matching in a noisy environment. Given that human behavior is notoriously unpredictable, this is probably the noisest environment of them all!


Leave a Comment