Powered by Blogger.

Issues with Crowdsourced Data Part 2

A recent guest Beneblog explains why we believe a correlation found between SMS text messages and building damage by researchers was not useful. Some of the questions we received made us realize we need to be clearer about why this is important. Why did we bother analyzing this claim? Why does it matter? Thanks to Patrick Ball, Jeff Klingner and Kristian Lum for contributing this material (and making it much clearer).

We’re reacting to the following claim: “Data collected using unbounded crowdsourcing (non-representative sampling) largely in the form of SMS from the disaster affected population in Port-au-Prince can predict, with surprisingly high accuracy and statistical significance, the location and extent of structural damage post-earthquake.”

While this claim is technically correct, it misses the point. If decision makers simply had a map, they could have made better decisions more quickly, more accurately, and with less complication than if they had tried to use crowdsourcing. Our concern is that if in the future decision makers depend on crowdsourcing, bad decisions are likely to result -- decisions that impact lives. So, we’re speaking up.

In the comments to our last post, Jon from Ushahidi said "If a tool's fitness cannot be absolute, then neither can it's fallibility." And, that the correlation they found was useful. Why is this something worth arguing about?

Misunderstanding relationships in data is a problem because it can lead to choosing less effective, more expensive data instead of choosing obvious, more accurate starting points. The correlation found in Haiti is an example of a "confounding factor". A correlation was found between building damage and SMS streams, but only because both were correlated with the simple existence of buildings. Thus the correlation between the SMS feed and the building damage is an artifact or spurious correlation. Here are two other examples of confounding effects.

- Children's reading skill is strongly correlated with their shoe size -- because older kids have bigger feet and tend to read better. You wouldn't measure all the shoes in a classroom to evaluate the kids' reading ability.

- Locations with high rates of drowning deaths are correlated with locations with high rates of ice cream sales because people tend to eat ice cream and swim when they're at leisure in hot places with water, like swimming pools and seasides. If we care about preventing drowning deaths, we don't set up a system to monitor ice cream vendors.

We're particularly concerned because we think that using a SMS stream to measure a pattern is probably at its best in a disaster situation. When there's a catastrophe, people often pull together and help each other. If an SMS stream was ever going to work as a pattern measure, it was going to be in a context like this -- and it didn't work very well. We don't think that SMS was a very good measure of building damage, relative to the obvious alternative of using a map of building locations.

The problems will be much worse if SMS streams are used to try to measure public violence. In these contexts, the perpetrators will be actively trying to suppress reporting, and so the SMS streams will not just measure where the cell phones are, they'll measure where the cell phones that perpetrators can't suppress are. We'll have many more "false negative" zones where there seems to be no violence, but there's simply no SMS traffic. And we'll have dense, highly duplicated reports of visible events where there are many observers and little attempt to suppress texting.

In the measurement of building damage in Port-au-Prince, there were several zones where there was lots of damage but few or no SMS messages ("false negatives"). This occurred when no one was trying to stop people from texting. The data will be far more misleading when the phenomenon being measured is violence.

As we've said in each post, crowdsourcing generally and SMS traffic in particular is great for documenting specific requests for help. Our critique is that it's not a good way to generate a valid basis for understanding patterns.

No comments:

Post a Comment