Twitter Data Can Tell You Up To 8 Days In Advance
Adam Sadilek at the University of Rochester and his team analyzed 4.4 million GPS-tagged Tweets from over 600,000 users in New York City over the course of one month in 2010. Using their artificial intelligence algorithm to ignore tweets by healthy people such as those claiming they were ‘sick’ of a particular song, and train it to find those who were really ill, they are able to track with nearly 90% accuracy and almost 8 days in advance.
According to Sadilek’s blogpost:
Given that three of your friends have flu-like symptoms, and that you have recently met eight people, possibly strangers, who complained about having runny noses and headaches, what is the probability that you will soon become ill as well? Our models enable you to see the spread of infectious diseases, such as flu, throughout a real-life population observed through online social media.
We apply machine learning and natural language understanding techniques to determine the health state of Twitter users at any given time. Since a large fraction of tweets is geo-tagged, we can plot them on a map, and observe how sick and healthy people interact. Our model then predicts if and when an individual will fall ill with high accuracy, thereby improving our understanding of the emergence of global epidemics from people’s day-to-day interactions.
You can explore the health of New Yorkers with our web application at corpora.io.
Above you see a heatmap visualization of the prevalence of flu in New York City, as observed through public Twitter data. The more red an area is, the more people are afflicted by flu at that location. We show emergent aggregate patterns in real-time, with second-by-second resolution. By contrast, previous state-of-the-art methods (including Google Flu Trends and government data) entail time lags from days to years. You can play with our heatmap here.
The fine-grained epidemiological models we show here are just one instance of the general class of problems that our system solves. Other domains include understanding of the public sentiment around your company or products, the diffusion of information throughout a population, and predicting customer behavior.
By augmenting existing datasets with real-time insights and cues from social media, we are able to connect the dots, visualize patterns, and refine models based on user feedback.
Too Late To Do Anything?
Apparently this algorithm is accurate 90% of the time and up to 8 days in advance. The question is, does this data help you avoid the sickness, or does it merely tell you of impending doom… At any rate, it’s clever stuff!