These days, when people start feeling a fever and a sore throat coming on, often times their first move isn’t to the medicine cabinet. Instead, it’s to a computer or smartphone to Google their symptoms.
These queries, which make up only a tiny fraction of the more than 7 billion total queries the search engine handles each day, are all stored by Google. The company uses this data for a variety of reasons; it can help Google improve its search results for users—which also boosts the company’s bottom line—and can also benefit the population as a whole in other ways.
One example of the latter is Google Flu Trends, a statistical model developed by engineers at Google.org—the company’s foundational arm—in an effort to “now-cast” what’s happening with the flu on any given day.
But research has shown that GFT often misses its target. These results led Northeastern University network scientists and their colleagues to take a closer look at how Big Data should be used to advance scientific research. Their report was published online Thursday in the journal Science.
“Big Data have enormous scientific possibilities,” said Northeastern professor David Lazer. “But we have to be aware that most Big Data aren’t designed for scientific purposes.” Fully achieving Big Data’s enthusiastically lauded potential, he added, requires a synthesis of both computer science approaches to data as well as traditional approaches from the social sciences.
The paper was co-authored by Lazer, who holds joint appointments in the Department of Political Science and the College of Computer and Information Science; Alessandro Vespignani, the Sternberg Family Distinguished University Professor of Physics at Northeastern who has joint appointments in the College of Science, Bouvé College of Health Sciences, and the College of Computer and Information Science; Northeastern visiting research professor of political science Ryan Kennedy; and Gary King, a professor in the Harvard University Department of Government.