Observational methods in psychology research
Observational methods in psychology research are crucial for gathering accurate data about human and animal behaviors, especially given the inherent challenges in human observation. These methods aim to reduce perceptual distortions and biases that can compromise the objectivity necessary for scientific inquiry. Initially emphasized by early behaviorists like John B. Watson, the focus switched from subjective mental processes to observable behaviors, thereby enhancing the reliability of observational studies.
Researchers employ techniques such as behavioral taxonomies—structured categories that define and classify behaviors—to ensure observations are clear and consistent. This process involves a habituation period, during which subjects become accustomed to the observer's presence, allowing for more natural behavior to be recorded. Validity and reliability checks, such as interobserver agreement, are essential to ensure that the defined categories accurately reflect the behaviors being studied.
While observational methods provide valuable insights into behavior, they also highlight the limitations of human perception, which can influence various fields beyond psychology, such as law. The understanding that eyewitness accounts are often flawed underscores the need for caution when interpreting observational data. As this field continues to advance, there is a trend towards using automated tools to enhance objectivity and accuracy in observations.
Observational methods in psychology research
- TYPE OF PSYCHOLOGY: Psychological methodologies
Humans are poor observers: they omit, overemphasize, and distort various aspects of what they have seen. Observational methods in psychology have been devised to control or eliminate this problem. These methods increase the accuracy of observations by reducing the effects of perceptual distortion and bias. The development of this methodology has been central to the evolution of scientific psychology.
Introduction
Humans have tremendous difficulty making accurate observations. Different people will perceive the same event differently; they apply their own interpretations to what they see. One’s perception or recollection of an event, although it seems accurate, may well be faulty. This fact creates problems in science because science requires objective observation.
In large part, this problem is eliminated through the use of scientific instruments to make observations. Many situations exist, however, in which the experimenter is still the recorder. Therefore, methods must be available to prevent bias, distortion, and omission from contaminating observations. Behavior may be observed within natural settings. When using naturalistic observation, scientists only watch behavior; they do not interfere with it.
History of Research
The need for an observational methodology that ensures objective data became apparent early in the history of scientific psychology. In fact, in 1913, John B. Watson, an early American behaviorist, stated that for psychology to become a science at all, it must eliminate the influence of subjective judgment. Watson’s influence caused psychology to shift from the subjective study of mental processes to the objective study of behavior. Shifting the focus to behavior improved the reliability of observation dramatically. Behavior is tangible and observable. In the 1920s, the —a description of behavior in terms that are unambiguous, observable, and easily measured—was introduced. Through using such definitions, communication between psychologists improved greatly. Psychologists then developed experiments that met the scientific criterion of repeatability. Repeatability means that different researchers must be able to repeat the experiment and get similar results.
It soon became apparent, however, that this was not enough and that researchers' expectations biased their observations even when observations were focused on operationally defined behavior. Methods had to be developed to eliminate these effects, leading to the development of techniques to reduce or control for experimenter bias. The technique of interrater reliability is an example of one such method. Using observers who are uninformed about the researchers’ expectations also reduces experimenter bias.
In 1976, Robert Rosenthal reported results that showed that subject expectations can also contaminate observational data. It was found that simply observing subjects alters their behavior. How it changes depends on the subjects’ interpretation of the situation and their motivation. If subjects could discover what the experimenters’ expectations were, they could decide to help or hinder the progress of the research. This type of reactivity severely contaminates the accuracy of observational data. Although this problem is associated primarily with human research, animals also react to observers. This is why allowing sufficient time for animals under observation to habituate to one’s presence is important. Efforts to refine and improve observational methodology continue. Attention is now primarily directed at developing equipment to automate the observational process. The goal is to improve objectivity by removing the experimenter from the situation altogether.
Behavioral Taxonomy
To make an observation as accurate and objective as possible, researchers use behavioral taxonomy. A behavioral taxonomy is a set of behavioral categories that describe the behavior of the subjects under study. To develop a behavioral taxonomy, the experimenter must first watch the population of interest. The observer’s presence will alter the subjects’ behavior at first. Organisms are reactive, so their initial behavior in the presence of an observer is not typical. However, once they become accustomed to being observed, their behavior returns to normal. This initial observation period, called the habituation period, is important for two reasons. First, it allows the subjects time to become accustomed to the observer’s presence. Second, the researcher learns about the subjects by observing them in as many different situations as possible. During this time, a diary is kept. Behaviors and their possible functions are jotted down as they are seen. This diary would not be entirely accurate. The observer might distort how often a behavior occurred or perhaps overemphasize interesting behaviors. To overcome these problems, a behavioral taxonomy must be developed.
The taxonomy will include several behavioral categories. Each category describes a specific behavior. During observation, when the behavior is seen, the category is scored. Categories can be either general or specific. Broad categories permit very consistent, and hence reliable, scoring of behavior, but they are less precise. Specific categories are more precise but make scoring behavior more difficult and less reliable. Whether categories of behavior are general or specific, there are three criteria that all taxonomies must meet: A taxonomy must be clearly defined, mutually exclusive, and exhaustive.
All categories within the behavioral taxonomy must be operationally defined. Operationally defining a category means that one will describe, in concrete terms, exactly what one means by the category name. Operational definitions are used to indicate exactly what one must see to score the category. This serves to eliminate subjective judgment when scoring observations. It also permits scientists to communicate precisely about which behaviors are being studied.
Determining Reliability and Validity
The next step is to determine whether category definitions are reliable and valid. In psychological studies, reliability and validity are statistical evaluations of research, tests, methods, and other research variables that indicate whether the information gained from the study is likely to be able to be reproduced in other studies and whether the results measure the construct the researchers intended to study. If a method is reliable, it permits one to score the behavioral category consistently across populations.
To determine reliability in observational studies, most commonly, interrater or interobserver reliability is established. This tells whether two or more independent observers agree in scoring behavioral categories. If the rate of agreement is high, the category is reliable. This type of reliability is established by computing intraclass correlation coefficients, using a linear time component, or using a nesting model. For the taxonomy itself to be reliable, all its categories must be reliable.
Validity is established when one can show that one is really measuring what one thinks one is. This is very important, as it is not unusual to infer the function of a behavior only to discover later that the behavior served an entirely different purpose. Validity measures in observational studies may be internal or external. One way to establish validity is to show a relationship between the category definition and independent assessments of the same behavior in another group.
Exclusive and Exhaustive Categories
Once taxonomic categories are clearly defined, one must ensure they are mutually exclusive. This means that each behavior one observes should fit into one, and only one, category; there should be no overlap of meaning between categories. With overlap, the observer may become confused about which behavioral category to score. Such a judgment is subjective and will reduce the reliability of the taxonomy and objectivity of the observations.
Finally, the categories should be exhaustive. This means that the categories, as a group, must cover all the behaviors capable of being demonstrated by subjects. Ideally, there should be no behavior that cannot be scored. If the categories are not exhaustive, one will get a distorted idea of how often a particular behavior occurs. Taxonomy must not be developed so as to overrepresent behaviors one finds interesting. Mundane behaviors must be included as well. In this way, one can calculate how often each behavior occurs. Although efforts to develop an exhaustive taxonomy must be made, in reality this is impossible. New behaviors will invariably be seen throughout the course of extended observation. To control for this problem, observers will include a category entitled “other.” In this way, one can score a behavior even if one has never seen it before. By examining the number of times the “other” category is scored, one can understand how exhaustive the taxonomy is.
Taxonomy Approaches
In measuring behavior with a taxonomy, one can take several approaches. For example, one could use a clock to measure how long each behavior is observed. Using a duration approach is most useful when low-frequency, high-duration behavior is present. One could also quantify how often each behavioral category is scored. The frequency approach is most useful for scoring high-frequency, short-duration behaviors. One could use either the duration or the frequency approach separately or combine the two. Finally, the length and number of observational periods must be determined. In general, the more observational periods used, the better. With respect to length, the observational period must be long enough to permit adequate observation of behavior but short enough so that one does not become tired and miss important behavior.
Applied Research
An applied example of behavioral taxonomy is its use by researchers to describe monkey behavior. The first step would be to spend many days watching the monkeys’ behavior. During this time, the observers would be writing down, in diary form, the behaviors that they see. They would also indicate the function they believe that each behavior serves. The monkeys may appear disturbed or agitated during these initial observations; however, as time goes by, their behavior would become less agitated and they would pay less attention to the observers’ presence. Here, one can see the importance of the habituation period. If observers had begun recording behavior from the start, they would probably have described the monkeys inaccurately in some respect.
With the information acquired during the habituation period, the researchers would begin to develop a behavioral taxonomy. They must decide how general or specific the categories in the taxonomy will be. This depends primarily on their purpose. If the categories must be very sensitive to change in behavior, they should be specific. If not, broader categories can be used. Once categories are selected, they are operationally defined. A category for aggression, for example, could be operationally defined as “grabbing and shaking the cage fence while maintaining eye contact with the experimenter.” Note that this definition is clear and concrete. That is, it is based on observable behavior.
In developing the list of behavioral categories, researchers must be sure they are mutually exclusive and exhaustive. To be mutually exclusive, categories must be defined so there is no overlap in meaning between them. To illustrate, the vocalization category might be defined as “any discernible vocal output.” It would be unlikely, however, that this category would be mutually exclusive. For example, what if a monkey showed aggression but, while doing so, was also vocalizing? Would this be scored as an instance of aggression or vocalization? Because these categories are not mutually exclusive, one would not know. When this occurs, at least one of the categories must be redefined. The listing of categories must also be exhaustive. Observers must form a category for every possible behavior the monkeys might show; also, an “other” category must be included.
Once category definitions have been developed, it must be determined whether they are reliable, valid, mutually exclusive, and exhaustive. This can be determined by having two observers score monkey behavior using the taxonomy. If an interrater agreement is high (above 85 percent agreement), the definitions can be considered reliable. If it is low, researchers will revise the necessary category definitions. These observers can also determine if categories are mutually exclusive and exhaustive. They are mutually exclusive if observers find no confusion about which category to score. They are exhaustive if they do not need to score the “other” category. Finally, to establish the validity of category definitions, researchers could ask people familiar with monkey behavior to describe what they would expect to see within each of the categories. If their descriptions agree with the researchers’ definitions, there is some evidence that the taxonomy is valid.
With the taxonomy developed, the researchers must decide how many observational periods to use and how long each period will be. In general, the more observational periods used, the more reliable the results. Twenty observational periods are adequate to produce reliable data in most cases. The purpose of the study must be considered when deciding how long the observational period should be. If high-frequency behavior that falls into very specific categories is being observed, a short observational period should be used. For example, if eye blinks are being counted, the observational period should be no longer than two minutes. Any longer than this and observers would get tired and make inaccurate observations. On the other hand, if low-frequency behavior that is scored in broader categories (for example, tool use) is being watched, longer observational periods should be used.
Finally, researchers must decide how behavior will be quantified. They can measure how long each category of behavior is seen, how often each category of behavior is seen, or both. If they are interested in how much of the monkeys’ time is spent engaging in each behavior, they will use the duration approach. If, on the other hand, researchers are more interested in determining the likelihood that a particular behavior will occur, they will use the frequency approach.
With an appropriately developed behavioral taxonomy, the behavior of the monkeys can be described accurately and objectively. Researchers can make statements about the likelihood of various behaviors, what the behaviors mean, and how much time the monkeys spend engaged in each type of behavior. From this information, they obtain an in-depth understanding of the monkeys. For example, through the use of behavioral taxonomies, it is known that rhesus monkeys have a , are very social, can show tool use and other creative adaptations of behavior when necessary, and show rudimentary forms of communication.
Implications for Other Fields
Human observations and memories are flawed, and at the scientific level, much care must be taken to ensure that observations are accurate and objective. Understanding how human limitations affect observational capabilities has important implications beyond psychology—for example, in law. Tremendous weight is placed on eyewitness testimony in a court of law. Even though eyewitness accounts are often biased, distorted, and imperfect, courts may recognize them as the best evidence available. Because the human capacity to make accurate and objective observations is unreliable, people are well advised to evaluate eyewitness testimony carefully.
Because observational data is inherently flawed, it is important that the data gathered from these studies is used appropriately. Using observational data as the basis for factual claims may lead to inaccurate and misleading advice. As twenty-first-century psychology continued to emphasize the importance of evidence-based research in the diagnosis and treatment of mental health conditions, observational studies were less relied upon for large-scale claims. Observational data informed researchers of areas of study that might require scientific testing.
Bibliography
Bakeman, Roger. Observing Interactions: An Introduction to Sequential Analysis. 2nd ed., Cambridge UP, 1997.
Bordens, Kenneth S., and Bruce B. Abbott. Research Design and Methods: A Process Approach. 11th ed., McGraw-Hill Education, 2022.
Breakwell, Glynis M., et al. Research Methods in Psychology. 5th ed., Sage Publications, 2020.
Comer, Jonathan S., and Philip C. Kendall, editors. The Oxford Handbook of Research Strategies for Clinical Psychology. Oxford UP, 2013.
Coolican, Hugh. Research Methods and Statistics in Psychology. 8th ed., Routledge, 2024.
Leahey, Thomas. A History of Psychology: Main Currents in Psychological Thought. 7th ed., Prentice Hall, 2007.
McLeod, Saul. "Observation Method in Psychology: Naturalistic, Participant and Controlled." Simply Psychology, 26 June 2024, www.simplypsychology.org/observation.html. Accessed 1 Oct. 2024.
Nestor, Paul G., and Russel K. Schutt. Research Methods in Psychology. 2nd ed., Sage, 2015.