Sociological Approaches to Big Data
Sociological Approaches to Big Data examine how large-scale data collection and analysis intersect with social phenomena, emphasizing the impact of culture, society, and human behavior on data interpretation. These approaches focus on understanding the social contexts in which data is generated and used, highlighting the importance of considering issues such as privacy, ethics, and power dynamics. Researchers in this field analyze how big data can reveal insights into social patterns, but they also critically assess the limitations and biases inherent in data collection methods.
By utilizing qualitative and quantitative research methods, sociologists aim to bridge the gap between data-driven insights and the complexities of social reality. This perspective encourages a multidisciplinary dialogue, integrating concepts from sociology, computer science, and information technology. As big data becomes increasingly influential in decision-making across various sectors, sociologists advocate for a more nuanced understanding of how societal factors shape data practices and outcomes. Recognizing diverse viewpoints within society is essential for fostering responsible and equitable use of big data.
Sociological Approaches to Big Data
Last reviewed: February 2017
Abstract
How the discipline of sociology engages with the opportunities presented by Big Data has been the subject of much speculation in recent years, particularly in response to the emergence of social networks on the Internet. Critics have suggested that at least so far, the benefits of Big Data have not yet been realized, primarily because researchers have continued to use methodological approaches to research that fail to take advantage of Big Data’s real advantages, including the ability to gather data in real world conditions instead of within the artificial context of a survey instrument or an interview setting.
Overview
Big Data refers to both a process and a product, both of which are made possible through the use of the Internet and high powered computers. As the modern world moves at an ever increasing pace toward a world in which every interaction a person has generates data—from the purchase of a tube of toothpaste to the tracking of people’s location via cell tower pings to and from their cell phones to the contents of e-mails and text messages—it becomes possible to make new connections between seemingly disparate types of information. So long as one has the computing resources needed to store and manipulate massive data sets and to connect them in creative ways, chances are that the data one is interested in are available somehow, somewhere.
Sociology, the study of human society, has seen dramatic changes since the advent of the Internet. Before the Internet emerged as a new social force and platform, sociologists studied groups of people and the societies they formed through their interactions, whether consciously or unconsciously (Shah, Cappella & Neuman, 2015). During this period, geography had a profound influence on society, because people forming a definable group tended to be located near to one another, because if they were at a distance they would not have any practical means of interacting with each other. Technology, particularly communications technology, was the driving force that changed this paradigm, because it made it possible for people to form groups even when members are geographically dispersed. In a sense, this means that networks become emergent rather than defined by geography, and instead of society creating networks, in many cases networks are creating societies.
For example, in the aftermath of an earthquake at the North Pole, survivors might use a social networking platform such as Twitter to let their loved ones know that they are safe and to coordinate relief efforts, tagging all of their messages with the hashtag #NPquake. Users of this hashtag might be located anywhere in the world, but they could still interact as a group via the Internet. More relevant for sociologists, their interactions could be studied in real time or at some point in the future by those well-versed in Big Data (O’Brien, Sampson & Winship, 2015).
Throughout most of the field’s history, sociological research has followed the pattern of researchers posing one or more research questions about a population. They would then design a method for collecting information from that population, such as a survey or a set of interview questions. Because most populations are too large to allow researchers to collect data from each person in the population, the researchers would select a sample from the population and collect data from the sample in order to make predictions about the entire population. This approach is far from perfect because there are several points in the process at which researchers must make educated guesses or imprecise estimates, based on statistical formulas, with the understanding that these guesses make the results of the overall study somewhat less reliable.
The first imprecise estimate occurs when the sample is selected Researchers try to choose participants in such a way that the sample will faithfully represent the population as a whole, so that the information collected from the sample can be generalized to the larger population. Of course, no sample is perfectly accurate, which is why the sampling method always includes a margin of error, a statistical indication of the level of uncertainty there is about the accuracy of the information as applied to the population. The second estimation takes place when the sample results are generalized this way (Poorthuis & Zook, 2015).
Big Data is a complete departure from this classical research approach because sampling and prediction are not necessary—the data that is being studied is generated by the huge technology platforms and networks in the course of their constant use. In addition, the data are usually in a form that is convenient to use for many applications, since they are digital and not analog, as would be the case if one interviewed dozens of people and then had to transcribe the interview recordings into a text document before they could be analyzed.
What is puzzling to many observers of the field of sociology is that more researchers are not taking advantage of these features of Big Data; instead, they continue to use old research approaches with new data that does not require those approaches. For example, if two and a half million Twitter users posted about a new type of online game and sociologists wished to study this phenomenon, they would probably try to start by selecting a sample of the population of two and a half million people. This would no doubt produce some results, but it would be an approach that overlooks the fact that the whole point of Big Data is that there is no need to sample—the data from each person in the group of two and a half million can be analyzed as it is created (Hargittai, 2015).
The real message of much of the writing about the potential of Big Data is that it gives scholars new ways to think about the information being generated online, and consequently this should drive researchers to ask new kinds of questions. Observations that previously were purely anecdotal may now be investigated in greater detail than ever before possible. This is reminiscent of the topic of domestic violence incidents increasing in frequency on the day of the Super Bowl. Those who work with domestic violence victims were aware of the apparent correlation, but only when data from many different regions could be pulled together and analyzed did the pattern emerge.
Using Big Data, researchers can make and test such connections even more quickly and reliably. This can be done on the macro level, where analysis is concerned with broad trends across the data set, as well as on the micro level, where the focus is on smaller groups or even individual users, the roles they play, and the functions they perform within the information ecosystem. Startling insights can emerge from the manipulation of Big Data, as when researchers study the dynamics of how social movements emerge and operate over time; it is even possible to identify the individual actors on a platform like Twitter who are the most influential, and to trace the emergence and distribution of ideas and judgments within the population being studied (Crosas, King, Honaker & Sweeney, 2015).
Further Insights. One might be forgiven for supposing that many of the usual ethical concerns that apply in traditional research contexts, such as confidentiality, privacy, and informed consent, do not apply to work with Big Data, or if they do apply are certainly less urgent. The assumption held by many is that because Big Data is so large, individuals will inevitably be lost within it and therefore unidentifiable—it would not be feasible to look at the data generated by a particular person and deduce that person’s identity from it. There are several problems with this assumption.
Foremost, the massive size of data sets does not protect the anonymity of participants and cannot be passively relied upon rather than properly assigning the affirmative duty of such protection to the researcher. Big Data exists in digital form so it can easily be manipulated and linked to other types of information. Given sufficient time and access, it would be possible to reconstruct a person’s precise physical location, movement, and activities throughout any given day, simply by using various data sets and linking them together. Thus, the standard has been raised for the researcher’s ethical duty to protect subjects’ privacy (Tinati, Halford, Carr & Pope, 2014).
Another issue with ethical implications concerns the data that is analyzed in the course of research. The defining feature of Big Data is its large scale—it is a huge amount of information. A problem common to virtually all research is the need to be able to draw a boundary around the data, in order to decide what lies within the set of data to be analyzed and what falls outside of the boundary and can therefore be removed from consideration. This involves both technical skill and professional judgment. Professional judgment gives one the expertise needed to decide what is relevant to the investigation and what is not, while technical skill allows one to properly manipulate the data set or sets that are being worked on. Both of these tasks have potential ethical implications that require careful consideration on the part of the researcher, who must find a way to walk the fine line between studying an issue and defining that issue.
For example, if a researcher is studying the subject of recidivism among the population of persons in the United States who have prior criminal convictions, one of the decisions that might have to be made is where to draw the line as to what counts as re-offending. One person might say that only an additional criminal conviction should count as recidivism, while another might say that a felony should count but not a misdemeanor, and a third person might suggest that an additional conviction is not necessary and that mere contact with law enforcement—such as being questioned or pulled over—should be enough. Whichever of these options is chosen, the researcher will be making a decision that has a major impact on how the research results can be used and interpreted (Hesse, Moser & Riley, 2015).
Issues
Part of the problem that sociologists face with Big Data is the lack of sufficient qualified personnel within the discipline who are able to extract information from large sets of data. Doing this requires a level of sophistication with computers and databases that most sociologists do not possess. This is because many sociologists prepared for their careers at a time when technology was far less developed than it is today. In addition, even now many programs in sociology do not contain an extensive amount of course work relevant to work with large data sets, which requires skills like computer programming, database management, and statistical analysis (Bail, 2014).
Many of the people who possess the technical skills either do not choose the field of sociology to begin with, or they do not remain within it for long, choosing other, more lucrative career options. To address this problem, many sociologists have begun to partner with data scientists and computer programmers in order to benefit from their skills and apply those skills to their own research interests. Doing this means that the sociologist does not have to go out and acquire the technological skill needed to augment his or her line of inquiry, because the technical functions are being outsourced to a research partner.
Interdisciplinarity also creates opportunities for more substantive collaborations, as researchers in computer and information science are often interested in topics that have relevance to sociology as well. This is particularly true with regard to research studies involving social networking platforms and online behavior, which together make up much of the work being done with Big Data (O'Donnell & Falk, 2015).
Even as researchers from computer science and sociology explore collaboration models, some have pointed out that this type of work brings with it inherent obstacles that must be overcome. One of these concerns the various researchers’ individual perspectives on the purpose of statistical analysis. Put simply, sociologists tend to look for ways to find correlations between particular variables in their data and particular outcomes, so that they can learn whether or not a causal connection may exist between the two.
For example, in a study of youth who begin smoking, one variable might be the marital status of the parents, and a sociologist would be interested in finding out if having parents who are divorced or separated increases the likelihood that one will begin smoking. Computer scientists, on the other hand, are usually not very interested in variables and causality. Their main concern is to create something that works, whether it is a database query, a program for scanning documents and recognizing the text they contain, or a website that displays a visual representation of trending topics on Twitter in real time (Lin, 2015).
The challenge is to help sociologists and information scientists understand one another enough to be able to work together and, hopefully, to contribute new and useful insights. This may be a tall order, but the opportunities presented by Big Data to better understand and eventually explain the transformations underway throughout society are worth the effort.
Terms & Concepts
Dynamism: A quality of Big Data that refers to its ability to capture information about events as they happen rather than after a delay.
Emergent Network: A network that comes into existence of its own volition rather than at the direction of a particular person or group, such as a network of Twitter users interested enough to post about a particular news event.
Hashtag: A method of tagging posts on Twitter so that all tweets mentioning the hashtag can be located. A pound sign (#) is combined with a keyword, as in the case of #inauguration or #royalwedding. Hashtags are frequently used by social scientists studying Twitter trends.
Proportionality: A property of Big Data that describes the proportion of the total existing users in a certain category to the users studied in a research project. With Big Data, proportionality is a major advantage because all users in the category can potentially be included in the study, instead of having to select a limited sample.
Relationality: A property of Big Data that describes how it can be linked across studies and even across disciplines, due to its digital nature.
Sampling: Sampling is the selection of a smaller group for the purpose of studying members’ behavior in order to draw conclusions about the behavior of the larger group the sample was drawn from. Traditionally sampling was used because there was no practical way to obtain data from every member of the larger group, but this is no longer a limitation with Big Data.
Scale: A property of Big Data that helps distinguish it from traditional research approaches, in which sampling was used to locate a representative group that could be conveniently studied.
Bibliography
Bail, C. (2014). The cultural environment: Measuring culture with big data. Theory & Society, 43(3/4), 465. Retrieved October 23, 2016, from EBSCO Online Database Sociology Source Ultimate. http://search.ebscohost.com/login.aspx?direct=true&db=sxi&AN=96774227&site=ehost-live
Crosas, M., King, G., Honaker, J., & Sweeney, L. (2015). Automating open science for big data. Annals of the American Academy of Political and Social Science, 659, 260–273.
Hargittai, E. (2015). Is bigger always better? Potential biases of big data derived from social network sites. Annals of the American Academy of Political and Social Science, 659, 63–76.
Hesse, B. W., Moser, R. P., & Riley, W. T. (2015). From big data to knowledge in the social sciences. Annals of the American Academy of Political and Social Science, 659, 16–32.
Lin, J. (2015). On building better mousetraps and understanding the human condition: Reflections on big data in the social sciences. Annals of the American Academy of Political and Social Science, 659, 33–47.
Poorthuis, A., & Zook, M. (2015). Small stories in big data: Gaining insights from large spatial point pattern datasets. Cityscape: A Journal of Policy Development and Research, 17(1), 151–160.
O’Brien, D. T., Sampson, R. J., & Winship, C. (2015). Ecometrics in the age of big data. Sociological Methodology, 45(1), 101. Retrieved October 23, 2016, from EBSCO Online Database Sociology Source Ultimate. http://search.ebscohost.com/login.aspx?direct=true&db=sxi&AN=109063328&site=ehost-live
O’Donnell, M. B., & Falk, E. B. (2015). Big data under the microscope and brains in social context: Integrating methods from computational social science and neuroscience. Annals of the American Academy of Political and Social Science, 659, 274–289.
Shah, D. V., Cappella, J. N., & Neuman, W. R. (2015). Big data, digital media, and computational social science: Possibilities and perils. Annals of the American Academy of Political and Social Science, 659, 6–13.
Tinati, R., Halford, S., Carr, L., & Pope, C. (2014). Big data: Methodological challenges and approaches for sociological analysis. Sociology, 48(4), 663–681. Retrieved October 23, 2016, from EBSCO Online Database Sociology Source Ultimate. http://search.ebscohost.com/login.aspx?direct=true&db=sxi&AN=97863815&site=ehost-live
Suggested Reading
Ajunwa, I., Crawford, K., & Ford, J. S. (2016). Health and big data: An ethical framework for health information collection by corporate wellness programs. Journal of Law, Medicine & Ethics, 44(3), 474–480.
Lohmeier, C. (2014). The researcher and the never-ending field: reconsidering big data and digital ethnography. Studies in Qualitative Methodology, 13, 75–89. Retrieved October 23, 2016, from EBSCO Online Database Sociology Source Ultimate. http://search.ebscohost.com/login.aspx?direct=true&db=sxi&AN=99685692&site=ehost-live
Papacharissi, Z. (2015). The unbearable lightness of information and the impossible gravitas of knowledge: Big data and the makings of a digital orality. Media, Culture & Society, 37(7), 1095–1100. Retrieved October 23, 2016, from EBSCO Online Database Sociology Source Ultimate.
Pettit, M. (2016). Historical time in the age of big data: Cultural psychology, historical change, and the Google books ngram viewer. History of Psychology, 19(2), 141–153.
Shirtcliff, B. (2015). Big data in the big easy: How social networks can improve the place for young people in cities. Landscape Journal, 34(2), 161.
Trottier, D. (2014). Big data ambivalence: Visions and risks in practice. Studies in Qualitative Methodology, 13, 51–72. Retrieved October 23, 2016, from EBSCO Online Database Sociology Source Ultimate. http://search.ebscohost.com/login.aspx?direct=true&db=sxi&AN=99685691&site=ehost-live