Can significantly improve estimates of the current number of cases of flu
Researchers have found that Google search data really can provide a more accurate real time picture of current flu infections. Critics argue claims of the power of big data to help measure and predict human behaviour are far too optimistic and often point to the case of Google Flu Trends.
Official reports of influenza infection rates are produced with a delay of at least one week. Yet researchers from Google and the Centers for Disease Control and Prevention (CDC) reported that data on searches for influenza related terms could be used to provide a real time estimate of the number of people with flu infections, with almost no delay.
Many have since dismissed the value of this big data however, after a spike in flu related searches due to a news scare caused the estimates provided by Google in January 2013 to be significantly higher than the measurements subsequently reported by the CDC.
But a study published in Royal Society Open Science demonstrates that Google search data can indeed be used to significantly improve estimates of the current number of cases of flu, reducing the errors seen in a model using CDC data alone by up to 52.7 per cent.The researchers, Tobias Preis and Suzy Moat, of Warwick Business School, argue that Google Flu Trends provides a classic example of how big data models must adapt across time, to reflect changes in people’s behaviour.
Dr Preis, Associate Professor of Behavioural Science and Finance, said: “Our results show that dismissing this data is rather like throwing the baby out with the bathwater. It’s true that simply using the number of searches as an estimate of flu levels can result in misleading figures. However, simple models can be built to watch out for increases in searches that do not correspond to increases in reports of flu, and which use this information to improve upcoming estimates.”
Preis and Moat’s analysis shows that the famous January 2013 discrepancy could have been avoided with an “adaptive nowcasting” model, which monitors the relationship between Google search data and recent CDC measurements and integrates this information into its estimates of current flu levels. Dr Moat, Assistant Professor of Behavioural Science, said: “Official reports of flu levels can be delayed by at least a week, as the process of collecting data from doctors on the number of patients they have seen can be rather time consuming. With the official data alone, it is possible to create forecasts of the number of people who have the flu right now, by looking for patterns in the historical data.
“Just like forecasting the weather, however, sometimes these ‘nowcasts’ are wrong. Our analysis shows that by using data on Google searches for flu related symptoms, as well as the historic flu data, the error in these ‘nowcasts’ can be reduced by between 14.4 per cent and 52.7 per cent.” Dr Preis added: “Predicting the future in the past is of course much easier than truly predicting the future. To guard against us using information in our simulated ‘nowcasts’ which wouldn’t have been available at the time, we train our model using data from the first 16 weeks. We then test the predictions on the 17th week, and retrain our model using data from the second to the 17th week. This model is used to make a prediction for the 18th week, and so on.”
Dr Moat said: “We find that regardless of whether we use 16 weeks, four weeks or 32 weeks of recent data to train our model, the estimates we can make using the Google data are significantly better than those made using the historic CDC measurements alone.” Dr Preis concluded: “Our results show that public health professionals can indeed use data on the number of Google searches for flu related symptoms to improve their estimates of how many people have the flu right now, as long as their analysis takes simple precautions to allow for the fact that human behaviour can change across time.”