How Google Is Revolutionising Research

Google’s omnipresence has led to a wealth of information on what people think, want and desire. Such data is being used by researchers to understand human behaviours and psyche.

Credit: Digitalpfade/pixabay

Credit: Digitalpfade/pixabay

Aneree Parekh is a research assistant at the department of psychology, Monk Prayogshala, Mumbai.

The Internet was started as a haven a place where people could connect across nations, where they could get answers at the click of a button. And in 2017, the Internet has become life itself and Google the preferred navigator for most people in the world. A little more than a decade ago, Google was created to organise the world’s data and its success is well documented, not in the least by the fact that “google” has now become a common verb in many languages. Looking for a place to eat nearby? Google it. What’s the weather going to be like today? Google it. Show timings for a new movie? Google it. Is that weird mole on your back cancerous? Oh my god, google it!

The ubiquitous use of Google to find answers to questions big and small has provided unique and unprecedented knowledge about human behaviour patterns and psyche. Like a trail of breadcrumbs, the trails of internet searches we leave behind reveal our deepest fears, desires and secrets, and researchers are beginning to follow them.

Most people in the developed world, and an increasing number of people in the developing world, turn to Google for information on consumer products like cars and mobile phones, health, politics, entertainment and even love. This creates a wealth of information about what people want. The availability of such large datasets is almost unheard of in research circles, and Google provides an array of tools for researchers to analyse and make sense of thousands and thousands of search queries.

The most widely used application for Google searches is in market research. The search tool Google Trends is often used to understand brand health and monitor changes in consumer interest across metrics such as seasonality and competition. Derived from search queries, Trends is a numeric/historic representation of the relative volume of searches made on Google. This data can be mined for actionable insights in a way that is not possible with consumer surveys.

The data allows you to plan and prioritise awareness-based media campaigns for your product, understand the global reach and interest, and provides consumer interest data going all the way back to 2004. For example, when comparing Patanjali’s and Maggi’s noodles, it is clear that consumer internet search interest in the former is lacking and is largely restricted to India, as opposed to the latter. So Patanjali might want to think about its global outreach. While flexibility and state-wise local-consumer insight is limited in India, it is easy to see why the tool is a goldmine for market research as such.

Red line indicates Google searches for Maggi over time. The blue line indicates Google searches for Patanjali noodles over time. Source: Author provided

Red line indicates Google searches for Maggi over time. The blue line indicates Google searches for Patanjali noodles over time. Source: Author provided

This slideshow requires JavaScript.

Comparison of Google searches for Maggi and Patanjali noodles across regions. Source: Author provided

While the use of Google search queries for market research purposes is well explored, a new field in which it is gaining popularity is the social sciences. Big Data application for social sciences research is an emerging trend and the potential of Google data to help understand the human psyche has barely been scratched. Most research done in the social sciences relies on survey data or self-reported behaviour, both prone to social desirability biases. In order to look good or answer in a socially acceptable way, people exaggerate, leave out aspects of or just lie about their behaviours. For sensitive topics such as racial animus or sexual orientation, there is a substantial amount of misreporting even when the surveys are conducted online and anonymously.

The power of Google data is that people ask this white box things that they would not reveal to anyone or anywhere, indicating genuine interest. Anonymous Google search queries provide a rare glimpse into the behaviours, motivations, fears and desires of people – honest and unfiltered. This has led data scientist Seth Stephens-Davidowitz to label internet search data as the “digital truth serum.” Using Google tools like Trends, AdWords and Correlates, Stephens-Davidowitz revealed in his book darker truths about human behaviours. For instance: America has a higher number of closeted gay than traditional survey data would find – found by looking at same-sex pornography searches by men (nearly 5%), and predictive searches wherein the word ‘gay’ is 10% more likely to complete searches that begin with “is my husband…” than the second-place word “cheating.”

For another: In India, search data revealed that a high number of porn-related searches was on how to breastfeed husbands – a behaviour not revealed in any survey on sexual health. Mining through the data also reveals widespread racial animus against African-Americans in the US and increased Islamophobia following terrorist events, especially after pleas of tolerance.

The potential of Google data is also being understood in the medical sciences. New fields such as information epidemiology, or infodemiology, are proposed to understand the determinants and distribution of health information, which is said to be helpful for health professionals and patients seeking higher quality healthcare on the Internet. Using Trends, researchers have detected seasonal influenza outbreaks in regions in the US with a lag of only a day. Similarly, in a study led by Google investigators, anonymised and aggregated search volumes for terms related to “dengue” were found to fit well with the actual number of cases of dengue reported in Bolivia, Brazil, India, Indonesia and Singapore.

The availability of such real-time search queries means effective and immediate delivery of health services and information to places and individuals who require it. Seasonal trends in mental illnesses have also been revealed using Google search data. Search terms implied that people are 24% less likely to consider suicide in the summer, and queries about mental health dropped by 14% in the US and Australia from winter to summer. Such seasonal fluctuations are useful in studying and understanding the epidemiology of illnesses that are otherwise difficult to track.

Aside from the availability, the easy navigation and visualisation of large sets of data is also what attracts researchers to Google data. The company’s Public Data Explorer makes available large, public-interest datasets from varied international organisations such as the World Bank, OECD and governments such as the US, the UK, Iceland, etc. In order to prioritise which datasets to include and which to exclude, anonymous search logs were analysed to find patterns in the kinds of searches people were doing. The tool allows even novices to navigate complex datasets at the click of a button, compare data across countries or variables, and use animated charts and graphs, allowing users to see trends over time.

Analysing the data from the UN Human Development Programme Report (2015), the number of teenage mothers (aged 15-19 years) has reduced dramatically in India since 1990 and – after a peak in 1995 – have been reducing in the US as well. The availability and easy visualisation of this kind of information enables policy makers and interventionalists to design effective programs to tackle public health issues.

Births to teenage mothers (aged 15-19 years) in India and the US, 1980-2014. Source: Author provided

Births to teenage mothers (aged 15-19 years) in India and the US, 1980-2014. Source: Author provided

While Google data and tools provide an exciting new avenue for social sciences research, there are still strong limitations to using search query data. Indeed, one of the biggest disadvantages is the limited generalisability of the results thus derived. The samples are not randomly selected as they are in traditional research, making the results relevant only to Google-using netizens. Similarly, datasets and complex analytical tools are yet available mostly only for Americana and European samples. Additionally, the interpretations of results from these analyses are critically dependent on whether or not the search term parameters used are appropriate for the posed research questions. For example, one would not be able to definitively understand the extent to which sexism played a role in Hillary Clinton’s 2016 defeat because derogatory terms used to describe women are also key search terms for pornography.

The extent to which secondary data such as search queries can supplement – or even replace – burdensome, traditional data collection methods is yet unknown. The potential for mining localised data of developing countries is limited, the research methodology and the ethical and policy implications are still being debated. However initial indications suggest that search data may revolutionise research and our understanding of human beings.