Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
10 Experimental research
Experimental research—often considered to be the ‘gold standard’ in research designs—is one of the most rigorous of all research designs. In this design, one or more independent variables are manipulated by the researcher (as treatments), subjects are randomly assigned to different treatment levels (random assignment), and the results of the treatments on outcomes (dependent variables) are observed. The unique strength of experimental research is its internal validity (causality) due to its ability to link cause and effect through treatment manipulation, while controlling for the spurious effect of extraneous variable.
Experimental research is best suited for explanatory research—rather than for descriptive or exploratory research—where the goal of the study is to examine cause-effect relationships. It also works well for research that involves a relatively limited and well-defined set of independent variables that can either be manipulated or controlled. Experimental research can be conducted in laboratory or field settings. Laboratory experiments , conducted in laboratory (artificial) settings, tend to be high in internal validity, but this comes at the cost of low external validity (generalisability), because the artificial (laboratory) setting in which the study is conducted may not reflect the real world. Field experiments are conducted in field settings such as in a real organisation, and are high in both internal and external validity. But such experiments are relatively rare, because of the difficulties associated with manipulating treatments and controlling for extraneous effects in a field setting.
Experimental research can be grouped into two broad categories: true experimental designs and quasi-experimental designs. Both designs require treatment manipulation, but while true experiments also require random assignment, quasi-experiments do not. Sometimes, we also refer to non-experimental research, which is not really a research design, but an all-inclusive term that includes all types of research that do not employ treatment manipulation or random assignment, such as survey research, observational research, and correlational studies.
Basic concepts
Treatment and control groups. In experimental research, some subjects are administered one or more experimental stimulus called a treatment (the treatment group ) while other subjects are not given such a stimulus (the control group ). The treatment may be considered successful if subjects in the treatment group rate more favourably on outcome variables than control group subjects. Multiple levels of experimental stimulus may be administered, in which case, there may be more than one treatment group. For example, in order to test the effects of a new drug intended to treat a certain medical condition like dementia, if a sample of dementia patients is randomly divided into three groups, with the first group receiving a high dosage of the drug, the second group receiving a low dosage, and the third group receiving a placebo such as a sugar pill (control group), then the first two groups are experimental groups and the third group is a control group. After administering the drug for a period of time, if the condition of the experimental group subjects improved significantly more than the control group subjects, we can say that the drug is effective. We can also compare the conditions of the high and low dosage experimental groups to determine if the high dose is more effective than the low dose.
Treatment manipulation. Treatments are the unique feature of experimental research that sets this design apart from all other research methods. Treatment manipulation helps control for the ‘cause’ in cause-effect relationships. Naturally, the validity of experimental research depends on how well the treatment was manipulated. Treatment manipulation must be checked using pretests and pilot tests prior to the experimental study. Any measurements conducted before the treatment is administered are called pretest measures , while those conducted after the treatment are posttest measures .
Random selection and assignment. Random selection is the process of randomly drawing a sample from a population or a sampling frame. This approach is typically employed in survey research, and ensures that each unit in the population has a positive chance of being selected into the sample. Random assignment, however, is a process of randomly assigning subjects to experimental or control groups. This is a standard practice in true experimental research to ensure that treatment groups are similar (equivalent) to each other and to the control group prior to treatment administration. Random selection is related to sampling, and is therefore more closely related to the external validity (generalisability) of findings. However, random assignment is related to design, and is therefore most related to internal validity. It is possible to have both random selection and random assignment in well-designed experimental research, but quasi-experimental research involves neither random selection nor random assignment.
Threats to internal validity. Although experimental designs are considered more rigorous than other research methods in terms of the internal validity of their inferences (by virtue of their ability to control causes through treatment manipulation), they are not immune to internal validity threats. Some of these threats to internal validity are described below, within the context of a study of the impact of a special remedial math tutoring program for improving the math abilities of high school students.
History threat is the possibility that the observed effects (dependent variables) are caused by extraneous or historical events rather than by the experimental treatment. For instance, students’ post-remedial math score improvement may have been caused by their preparation for a math exam at their school, rather than the remedial math program.
Maturation threat refers to the possibility that observed effects are caused by natural maturation of subjects (e.g., a general improvement in their intellectual ability to understand complex concepts) rather than the experimental treatment.
Testing threat is a threat in pre-post designs where subjects’ posttest responses are conditioned by their pretest responses. For instance, if students remember their answers from the pretest evaluation, they may tend to repeat them in the posttest exam.
Not conducting a pretest can help avoid this threat.
Instrumentation threat , which also occurs in pre-post designs, refers to the possibility that the difference between pretest and posttest scores is not due to the remedial math program, but due to changes in the administered test, such as the posttest having a higher or lower degree of difficulty than the pretest.
Mortality threat refers to the possibility that subjects may be dropping out of the study at differential rates between the treatment and control groups due to a systematic reason, such that the dropouts were mostly students who scored low on the pretest. If the low-performing students drop out, the results of the posttest will be artificially inflated by the preponderance of high-performing students.
Regression threat —also called a regression to the mean—refers to the statistical tendency of a group’s overall performance to regress toward the mean during a posttest rather than in the anticipated direction. For instance, if subjects scored high on a pretest, they will have a tendency to score lower on the posttest (closer to the mean) because their high scores (away from the mean) during the pretest were possibly a statistical aberration. This problem tends to be more prevalent in non-random samples and when the two measures are imperfectly correlated.
Two-group experimental designs
Pretest-posttest control group design . In this design, subjects are randomly assigned to treatment and control groups, subjected to an initial (pretest) measurement of the dependent variables of interest, the treatment group is administered a treatment (representing the independent variable of interest), and the dependent variables measured again (posttest). The notation of this design is shown in Figure 10.1.
Statistical analysis of this design involves a simple analysis of variance (ANOVA) between the treatment and control groups. The pretest-posttest design handles several threats to internal validity, such as maturation, testing, and regression, since these threats can be expected to influence both treatment and control groups in a similar (random) manner. The selection threat is controlled via random assignment. However, additional threats to internal validity may exist. For instance, mortality can be a problem if there are differential dropout rates between the two groups, and the pretest measurement may bias the posttest measurement—especially if the pretest introduces unusual topics or content.
Posttest -only control group design . This design is a simpler version of the pretest-posttest design where pretest measurements are omitted. The design notation is shown in Figure 10.2.
The treatment effect is measured simply as the difference in the posttest scores between the two groups:
The appropriate statistical analysis of this design is also a two-group analysis of variance (ANOVA). The simplicity of this design makes it more attractive than the pretest-posttest design in terms of internal validity. This design controls for maturation, testing, regression, selection, and pretest-posttest interaction, though the mortality threat may continue to exist.
Because the pretest measure is not a measurement of the dependent variable, but rather a covariate, the treatment effect is measured as the difference in the posttest scores between the treatment and control groups as:
Due to the presence of covariates, the right statistical analysis of this design is a two-group analysis of covariance (ANCOVA). This design has all the advantages of posttest-only design, but with internal validity due to the controlling of covariates. Covariance designs can also be extended to pretest-posttest control group design.
Factorial designs
Two-group designs are inadequate if your research requires manipulation of two or more independent variables (treatments). In such cases, you would need four or higher-group designs. Such designs, quite popular in experimental research, are commonly called factorial designs. Each independent variable in this design is called a factor , and each subdivision of a factor is called a level . Factorial designs enable the researcher to examine not only the individual effect of each treatment on the dependent variables (called main effects), but also their joint effect (called interaction effects).
In a factorial design, a main effect is said to exist if the dependent variable shows a significant difference between multiple levels of one factor, at all levels of other factors. No change in the dependent variable across factor levels is the null case (baseline), from which main effects are evaluated. In the above example, you may see a main effect of instructional type, instructional time, or both on learning outcomes. An interaction effect exists when the effect of differences in one factor depends upon the level of a second factor. In our example, if the effect of instructional type on learning outcomes is greater for three hours/week of instructional time than for one and a half hours/week, then we can say that there is an interaction effect between instructional type and instructional time on learning outcomes. Note that the presence of interaction effects dominate and make main effects irrelevant, and it is not meaningful to interpret main effects if interaction effects are significant.
Hybrid experimental designs
Hybrid designs are those that are formed by combining features of more established designs. Three such hybrid designs are randomised bocks design, Solomon four-group design, and switched replications design.
Randomised block design. This is a variation of the posttest-only or pretest-posttest control group design where the subject population can be grouped into relatively homogeneous subgroups (called blocks ) within which the experiment is replicated. For instance, if you want to replicate the same posttest-only design among university students and full-time working professionals (two homogeneous blocks), subjects in both blocks are randomly split between the treatment group (receiving the same treatment) and the control group (see Figure 10.5). The purpose of this design is to reduce the ‘noise’ or variance in data that may be attributable to differences between the blocks so that the actual effect of interest can be detected more accurately.
Solomon four-group design . In this design, the sample is divided into two treatment groups and two control groups. One treatment group and one control group receive the pretest, and the other two groups do not. This design represents a combination of posttest-only and pretest-posttest control group design, and is intended to test for the potential biasing effect of pretest measurement on posttest measures that tends to occur in pretest-posttest designs, but not in posttest-only designs. The design notation is shown in Figure 10.6.
Switched replication design . This is a two-group design implemented in two phases with three waves of measurement. The treatment group in the first phase serves as the control group in the second phase, and the control group in the first phase becomes the treatment group in the second phase, as illustrated in Figure 10.7. In other words, the original design is repeated or replicated temporally with treatment/control roles switched between the two groups. By the end of the study, all participants will have received the treatment either during the first or the second phase. This design is most feasible in organisational contexts where organisational programs (e.g., employee training) are implemented in a phased manner or are repeated at regular intervals.
Quasi-experimental designs
Quasi-experimental designs are almost identical to true experimental designs, but lacking one key ingredient: random assignment. For instance, one entire class section or one organisation is used as the treatment group, while another section of the same class or a different organisation in the same industry is used as the control group. This lack of random assignment potentially results in groups that are non-equivalent, such as one group possessing greater mastery of certain content than the other group, say by virtue of having a better teacher in a previous semester, which introduces the possibility of selection bias . Quasi-experimental designs are therefore inferior to true experimental designs in interval validity due to the presence of a variety of selection related threats such as selection-maturation threat (the treatment and control groups maturing at different rates), selection-history threat (the treatment and control groups being differentially impacted by extraneous or historical events), selection-regression threat (the treatment and control groups regressing toward the mean between pretest and posttest at different rates), selection-instrumentation threat (the treatment and control groups responding differently to the measurement), selection-testing (the treatment and control groups responding differently to the pretest), and selection-mortality (the treatment and control groups demonstrating differential dropout rates). Given these selection threats, it is generally preferable to avoid quasi-experimental designs to the greatest extent possible.
In addition, there are quite a few unique non-equivalent designs without corresponding true experimental design cousins. Some of the more useful of these designs are discussed next.
Regression discontinuity (RD) design . This is a non-equivalent pretest-posttest design where subjects are assigned to the treatment or control group based on a cut-off score on a preprogram measure. For instance, patients who are severely ill may be assigned to a treatment group to test the efficacy of a new drug or treatment protocol and those who are mildly ill are assigned to the control group. In another example, students who are lagging behind on standardised test scores may be selected for a remedial curriculum program intended to improve their performance, while those who score high on such tests are not selected from the remedial program.
Because of the use of a cut-off score, it is possible that the observed results may be a function of the cut-off score rather than the treatment, which introduces a new threat to internal validity. However, using the cut-off score also ensures that limited or costly resources are distributed to people who need them the most, rather than randomly across a population, while simultaneously allowing a quasi-experimental treatment. The control group scores in the RD design do not serve as a benchmark for comparing treatment group scores, given the systematic non-equivalence between the two groups. Rather, if there is no discontinuity between pretest and posttest scores in the control group, but such a discontinuity persists in the treatment group, then this discontinuity is viewed as evidence of the treatment effect.
Proxy pretest design . This design, shown in Figure 10.11, looks very similar to the standard NEGD (pretest-posttest) design, with one critical difference: the pretest score is collected after the treatment is administered. A typical application of this design is when a researcher is brought in to test the efficacy of a program (e.g., an educational program) after the program has already started and pretest data is not available. Under such circumstances, the best option for the researcher is often to use a different prerecorded measure, such as students’ grade point average before the start of the program, as a proxy for pretest data. A variation of the proxy pretest design is to use subjects’ posttest recollection of pretest data, which may be subject to recall bias, but nevertheless may provide a measure of perceived gain or change in the dependent variable.
Separate pretest-posttest samples design . This design is useful if it is not possible to collect pretest and posttest data from the same subjects for some reason. As shown in Figure 10.12, there are four groups in this design, but two groups come from a single non-equivalent group, while the other two groups come from a different non-equivalent group. For instance, say you want to test customer satisfaction with a new online service that is implemented in one city but not in another. In this case, customers in the first city serve as the treatment group and those in the second city constitute the control group. If it is not possible to obtain pretest and posttest measures from the same customers, you can measure customer satisfaction at one point in time, implement the new service program, and measure customer satisfaction (with a different set of customers) after the program is implemented. Customer satisfaction is also measured in the control group at the same times as in the treatment group, but without the new program implementation. The design is not particularly strong, because you cannot examine the changes in any specific customer’s satisfaction score before and after the implementation, but you can only examine average customer satisfaction scores. Despite the lower internal validity, this design may still be a useful way of collecting quasi-experimental data when pretest and posttest data is not available from the same subjects.
An interesting variation of the NEDV design is a pattern-matching NEDV design , which employs multiple outcome variables and a theory that explains how much each variable will be affected by the treatment. The researcher can then examine if the theoretical prediction is matched in actual observations. This pattern-matching technique—based on the degree of correspondence between theoretical and observed patterns—is a powerful way of alleviating internal validity concerns in the original NEDV design.
Perils of experimental research
Experimental research is one of the most difficult of research designs, and should not be taken lightly. This type of research is often best with a multitude of methodological problems. First, though experimental research requires theories for framing hypotheses for testing, much of current experimental research is atheoretical. Without theories, the hypotheses being tested tend to be ad hoc, possibly illogical, and meaningless. Second, many of the measurement instruments used in experimental research are not tested for reliability and validity, and are incomparable across studies. Consequently, results generated using such instruments are also incomparable. Third, often experimental research uses inappropriate research designs, such as irrelevant dependent variables, no interaction effects, no experimental controls, and non-equivalent stimulus across treatment groups. Findings from such studies tend to lack internal validity and are highly suspect. Fourth, the treatments (tasks) used in experimental research may be diverse, incomparable, and inconsistent across studies, and sometimes inappropriate for the subject population. For instance, undergraduate student subjects are often asked to pretend that they are marketing managers and asked to perform a complex budget allocation task in which they have no experience or expertise. The use of such inappropriate tasks, introduces new threats to internal validity (i.e., subject’s performance may be an artefact of the content or difficulty of the task setting), generates findings that are non-interpretable and meaningless, and makes integration of findings across studies impossible.
The design of proper experimental treatments is a very important task in experimental design, because the treatment is the raison d’etre of the experimental method, and must never be rushed or neglected. To design an adequate and appropriate task, researchers should use prevalidated tasks if available, conduct treatment manipulation checks to check for the adequacy of such tasks (by debriefing subjects after performing the assigned task), conduct pilot tests (repeatedly, if necessary), and if in doubt, use tasks that are simple and familiar for the respondent sample rather than tasks that are complex or unfamiliar.
In summary, this chapter introduced key concepts in the experimental design research method and introduced a variety of true experimental and quasi-experimental designs. Although these designs vary widely in internal validity, designs with less internal validity should not be overlooked and may sometimes be useful under specific circumstances and empirical contingencies.
Social Science Research: Principles, Methods and Practices (Revised edition) Copyright © 2019 by Anol Bhattacherjee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
Share This Book
5.3 Experimentation and Validity
Learning objectives.
- Explain what internal validity is and why experiments are considered to be high in internal validity.
- Explain what external validity is and evaluate studies in terms of their external validity.
- Explain the concepts of construct and statistical validity.
Four Big Validities
When we read about psychology experiments with a critical view, one question to ask is “is this study valid?” However, that question is not as straightforward as it seems because, in psychology, there are many different kinds of validities. Researchers have focused on four validities to help assess whether an experiment is sound (Judd & Kenny, 1981; Morling, 2014) [1] [2] : internal validity, external validity, construct validity, and statistical validity. We will explore each validity in depth.
Internal Validity
Two variables being statistically related does not necessarily mean that one causes the other. “Correlation does not imply causation.” For example, if it were the case that people who exercise regularly are happier than people who do not exercise regularly, this implication would not necessarily mean that exercising increases people’s happiness. It could mean instead that greater happiness causes people to exercise or that something like better physical health causes people to exercise and be happier.
The purpose of an experiment, however, is to show that two variables are statistically related and to do so in a way that supports the conclusion that the independent variable caused any observed differences in the dependent variable. The logic is based on this assumption: If the researcher creates two or more highly similar conditions and then manipulates the independent variable to produce just one difference between them, then any later difference between the conditions must have been caused by the independent variable. For example, because the only difference between Darley and Latané’s conditions was the number of students that participants believed to be involved in the discussion, this difference in belief must have been responsible for differences in helping between the conditions.
An empirical study is said to be high in internal validity if the way it was conducted supports the conclusion that the independent variable caused any observed differences in the dependent variable. Thus experiments are high in internal validity because the way they are conducted—with the manipulation of the independent variable and the control of extraneous variables—provides strong support for causal conclusions. In contrast, nonexperimental research designs (e.g., correlational designs), in which variables are measured but are not manipulated by an experimenter, are low in internal validity.
External Validity
At the same time, the way that experiments are conducted sometimes leads to a different kind of criticism. Specifically, the need to manipulate the independent variable and control extraneous variables means that experiments are often conducted under conditions that seem artificial (Bauman, McGraw, Bartels, & Warren, 2014) [3] . In many psychology experiments, the participants are all undergraduate students and come to a classroom or laboratory to fill out a series of paper-and-pencil questionnaires or to perform a carefully designed computerized task. Consider, for example, an experiment in which researcher Barbara Fredrickson and her colleagues had undergraduate students come to a laboratory on campus and complete a math test while wearing a swimsuit (Fredrickson, Roberts, Noll, Quinn, & Twenge, 1998) [4] . At first, this manipulation might seem silly. When will undergraduate students ever have to complete math tests in their swimsuits outside of this experiment?
The issue we are confronting is that of external validity . An empirical study is high in external validity if the way it was conducted supports generalizing the results to people and situations beyond those actually studied. As a general rule, studies are higher in external validity when the participants and the situation studied are similar to those that the researchers want to generalize to and participants encounter every day, often described as mundane realism . Imagine, for example, that a group of researchers is interested in how shoppers in large grocery stores are affected by whether breakfast cereal is packaged in yellow or purple boxes. Their study would be high in external validity and have high mundane realism if they studied the decisions of ordinary people doing their weekly shopping in a real grocery store. If the shoppers bought much more cereal in purple boxes, the researchers would be fairly confident that this increase would be true for other shoppers in other stores. Their study would be relatively low in external validity, however, if they studied a sample of undergraduate students in a laboratory at a selective university who merely judged the appeal of various colors presented on a computer screen; however, this study would have high psychological realism where the same mental process is used in both the laboratory and in the real world. If the students judged purple to be more appealing than yellow, the researchers would not be very confident that this preference is relevant to grocery shoppers’ cereal-buying decisions because of low external validity but they could be confident that the visual processing of colors has high psychological realism.
We should be careful, however, not to draw the blanket conclusion that experiments are low in external validity. One reason is that experiments need not seem artificial. Consider that Darley and Latané’s experiment provided a reasonably good simulation of a real emergency situation. Or consider field experiments that are conducted entirely outside the laboratory. In one such experiment, Robert Cialdini and his colleagues studied whether hotel guests choose to reuse their towels for a second day as opposed to having them washed as a way of conserving water and energy (Cialdini, 2005) [5] . These researchers manipulated the message on a card left in a large sample of hotel rooms. One version of the message emphasized showing respect for the environment, another emphasized that the hotel would donate a portion of their savings to an environmental cause, and a third emphasized that most hotel guests choose to reuse their towels. The result was that guests who received the message that most hotel guests choose to reuse their towels, reused their own towels substantially more often than guests receiving either of the other two messages. Given the way they conducted their study, it seems very likely that their result would hold true for other guests in other hotels.
A second reason not to draw the blanket conclusion that experiments are low in external validity is that they are often conducted to learn about psychological processes that are likely to operate in a variety of people and situations. Let us return to the experiment by Fredrickson and colleagues. They found that the women in their study, but not the men, performed worse on the math test when they were wearing swimsuits. They argued that this gender difference was due to women’s greater tendency to objectify themselves—to think about themselves from the perspective of an outside observer—which diverts their attention away from other tasks. They argued, furthermore, that this process of self-objectification and its effect on attention is likely to operate in a variety of women and situations—even if none of them ever finds herself taking a math test in her swimsuit.
Construct Validity
In addition to the generalizability of the results of an experiment, another element to scrutinize in a study is the quality of the experiment’s manipulations or the construct validity . The research question that Darley and Latané started with is “does helping behavior become diffused?” They hypothesized that participants in a lab would be less likely to help when they believed there were more potential helpers besides themselves. This conversion from research question to experiment design is called operationalization (see Chapter 4 for more information about the operational definition). Darley and Latané operationalized the independent variable of diffusion of responsibility by increasing the number of potential helpers. In evaluating this design, we would say that the construct validity was very high because the experiment’s manipulations very clearly speak to the research question; there was a crisis, a way for the participant to help, and increasing the number of other students involved in the discussion, they provided a way to test diffusion.
What if the number of conditions in Darley and Latané’s study changed? Consider if there were only two conditions: one student involved in the discussion or two. Even though we may see a decrease in helping by adding another person, it may not be a clear demonstration of diffusion of responsibility, just merely the presence of others. We might think it was a form of Bandura’s social inhibition. The construct validity would be lower. However, had there been five conditions, perhaps we would see the decrease continue with more people in the discussion or perhaps it would plateau after a certain number of people. In that situation, we may not necessarily be learning more about diffusion of responsibility or it may become a different phenomenon. By adding more conditions, the construct validity may not get higher. When designing your own experiment, consider how well the research question is operationalized your study.
Statistical Validity
Statistical validity concerns the proper statistical treatment of data and the soundness of the researchers’ statistical conclusions. There are many different types of inferential statistics tests (e.g., t- tests, ANOVA, regression, correlation) and statistical validity concerns the use of the proper type of test to analyze the data. When considering the proper type of test, researchers must consider the scale of measure their dependent variable was measured on and the design of their study. Further, many of inferential statistics tests carry certain assumptions (e.g., the data are normally distributed) and statistical validity is threatened when these assumptions are not met but the statistics are used nonetheless.
One common critique of experiments is that a study did not have enough participants. The main reason for this criticism is that it is difficult to generalize about a population from a small sample. At the outset, it seems as though this critique is about external validity but there are studies where small sample sizes are not a problem (subsequent chapters will discuss how small samples, even of only 1 person, are still very illuminating for psychology research). Therefore, small sample sizes are actually a critique of statistical validity . The statistical validity speaks to whether the statistics conducted in the study are sound and support the conclusions that are made.
The proper statistical analysis should be conducted on the data to determine whether the difference or relationship that was predicted was found. The number of conditions and the total number of participants will determine the overall size of the effect. With this information, a power analysis can be conducted to ascertain whether you are likely to find a real difference. When designing a study, it is best to think about the power analysis so that the appropriate number of participants can be recruited and tested. To design a statistically valid experiment, thinking about the statistical tests at the beginning of the design will help ensure the results can be believed.
Prioritizing Validities
These four big validities–internal, external, construct, and statistical–are useful to keep in mind when both reading about other experiments and designing your own. However, researchers must prioritize and often it is not possible to have high validity in all four areas. In Cialdini’s study on towel usage in hotels, the external validity was high but the statistical validity was more modest. This discrepancy does not invalidate the study but it shows where there may be room for improvement for future follow-up studies (Goldstein, Cialdini, & Griskevicius, 2008) [6] . Morling (2014) points out that most psychology studies have high internal and construct validity but sometimes sacrifice external validity.
Key Takeaways
- Studies are high in internal validity to the extent that the way they are conducted supports the conclusion that the independent variable caused any observed differences in the dependent variable. Experiments are generally high in internal validity because of the manipulation of the independent variable and control of extraneous variables.
- Studies are high in external validity to the extent that the result can be generalized to people and situations beyond those actually studied. Although experiments can seem “artificial”—and low in external validity—it is important to consider whether the psychological processes under study are likely to operate in other people and situations.
- Judd, C.M. & Kenny, D.A. (1981). Estimating the effects of social interventions . Cambridge, MA: Cambridge University Press. ↵
- Morling, B. (2014, April). Teach your students to be better consumers. APS Observer . Retrieved from http://www.psychologicalscience.org/index.php/publications/observer/2014/april-14/teach-your-students-to-be-better-consumers.html ↵
- Bauman, C.W., McGraw, A.P., Bartels, D.M., & Warren, C. (2014). Revisiting external validity: Concerns about trolley problems and other sacrificial dilemmas in moral psychology. Social and Personality Psychology Compass, 8/9 , 536-554. ↵
- Fredrickson, B. L., Roberts, T.-A., Noll, S. M., Quinn, D. M., & Twenge, J. M. (1998). The swimsuit becomes you: Sex differences in self-objectification, restrained eating, and math performance. Journal of Personality and Social Psychology, 75 , 269–284. ↵
- Cialdini, R. (2005, April). Don’t throw in the towel: Use social influence research. APS Observer . Retrieved from http://www.psychologicalscience.org/index.php/publications/observer/2005/april-05/dont-throw-in-the-towel-use-social-influence-research.html ↵
- Goldstein, N. J., Cialdini, R. B., & Griskevicius, V. (2008). A room with a viewpoint: Using social norms to motivate environmental conservation in hotels. Journal of Consumer Research, 35 , 472–482. ↵
Share This Book
- Increase Font Size
Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
6.1 Experiment Basics
Learning objectives.
- Explain what an experiment is and recognize examples of studies that are experiments and studies that are not experiments.
- Explain what internal validity is and why experiments are considered to be high in internal validity.
- Explain what external validity is and evaluate studies in terms of their external validity.
- Distinguish between the manipulation of the independent variable and control of extraneous variables and explain the importance of each.
- Recognize examples of confounding variables and explain how they affect the internal validity of a study.
What Is an Experiment?
As we saw earlier in the book, an experiment is a type of study designed specifically to answer the question of whether there is a causal relationship between two variables. Do changes in an independent variable cause changes in a dependent variable? Experiments have two fundamental features. The first is that the researchers manipulate, or systematically vary, the level of the independent variable. The different levels of the independent variable are called conditions. For example, in Darley and Latané’s experiment, the independent variable was the number of witnesses that participants believed to be present. The researchers manipulated this independent variable by telling participants that there were either one, two, or five other students involved in the discussion, thereby creating three conditions. The second fundamental feature of an experiment is that the researcher controls, or minimizes the variability in, variables other than the independent and dependent variable. These other variables are called extraneous variables. Darley and Latané tested all their participants in the same room, exposed them to the same emergency situation, and so on. They also randomly assigned their participants to conditions so that the three groups would be similar to each other to begin with. Notice that although the words manipulation and control have similar meanings in everyday language, researchers make a clear distinction between them. They manipulate the independent variable by systematically changing its levels and control other variables by holding them constant.
Internal and External Validity
Internal validity.
Recall that the fact that two variables are statistically related does not necessarily mean that one causes the other. “Correlation does not imply causation.” For example, if it were the case that people who exercise regularly are happier than people who do not exercise regularly, this would not necessarily mean that exercising increases people’s happiness. It could mean instead that greater happiness causes people to exercise (the directionality problem) or that something like better physical health causes people to exercise and be happier (the third-variable problem).
The purpose of an experiment, however, is to show that two variables are statistically related and to do so in a way that supports the conclusion that the independent variable caused any observed differences in the dependent variable. The basic logic is this: If the researcher creates two or more highly similar conditions and then manipulates the independent variable to produce just one difference between them, then any later difference between the conditions must have been caused by the independent variable. For example, because the only difference between Darley and Latané’s conditions was the number of students that participants believed to be involved in the discussion, this must have been responsible for differences in helping between the conditions.
An empirical study is said to be high in internal validity if the way it was conducted supports the conclusion that the independent variable caused any observed differences in the dependent variable. Thus experiments are high in internal validity because the way they are conducted—with the manipulation of the independent variable and the control of extraneous variables—provides strong support for causal conclusions.
External Validity
At the same time, the way that experiments are conducted sometimes leads to a different kind of criticism. Specifically, the need to manipulate the independent variable and control extraneous variables means that experiments are often conducted under conditions that seem artificial or unlike “real life” (Stanovich, 2010). In many psychology experiments, the participants are all college undergraduates and come to a classroom or laboratory to fill out a series of paper-and-pencil questionnaires or to perform a carefully designed computerized task. Consider, for example, an experiment in which researcher Barbara Fredrickson and her colleagues had college students come to a laboratory on campus and complete a math test while wearing a swimsuit (Fredrickson, Roberts, Noll, Quinn, & Twenge, 1998). At first, this might seem silly. When will college students ever have to complete math tests in their swimsuits outside of this experiment?
The issue we are confronting is that of external validity. An empirical study is high in external validity if the way it was conducted supports generalizing the results to people and situations beyond those actually studied. As a general rule, studies are higher in external validity when the participants and the situation studied are similar to those that the researchers want to generalize to. Imagine, for example, that a group of researchers is interested in how shoppers in large grocery stores are affected by whether breakfast cereal is packaged in yellow or purple boxes. Their study would be high in external validity if they studied the decisions of ordinary people doing their weekly shopping in a real grocery store. If the shoppers bought much more cereal in purple boxes, the researchers would be fairly confident that this would be true for other shoppers in other stores. Their study would be relatively low in external validity, however, if they studied a sample of college students in a laboratory at a selective college who merely judged the appeal of various colors presented on a computer screen. If the students judged purple to be more appealing than yellow, the researchers would not be very confident that this is relevant to grocery shoppers’ cereal-buying decisions.
We should be careful, however, not to draw the blanket conclusion that experiments are low in external validity. One reason is that experiments need not seem artificial. Consider that Darley and Latané’s experiment provided a reasonably good simulation of a real emergency situation. Or consider field experiments that are conducted entirely outside the laboratory. In one such experiment, Robert Cialdini and his colleagues studied whether hotel guests choose to reuse their towels for a second day as opposed to having them washed as a way of conserving water and energy (Cialdini, 2005). These researchers manipulated the message on a card left in a large sample of hotel rooms. One version of the message emphasized showing respect for the environment, another emphasized that the hotel would donate a portion of their savings to an environmental cause, and a third emphasized that most hotel guests choose to reuse their towels. The result was that guests who received the message that most hotel guests choose to reuse their towels reused their own towels substantially more often than guests receiving either of the other two messages. Given the way they conducted their study, it seems very likely that their result would hold true for other guests in other hotels.
A second reason not to draw the blanket conclusion that experiments are low in external validity is that they are often conducted to learn about psychological processes that are likely to operate in a variety of people and situations. Let us return to the experiment by Fredrickson and colleagues. They found that the women in their study, but not the men, performed worse on the math test when they were wearing swimsuits. They argued that this was due to women’s greater tendency to objectify themselves—to think about themselves from the perspective of an outside observer—which diverts their attention away from other tasks. They argued, furthermore, that this process of self-objectification and its effect on attention is likely to operate in a variety of women and situations—even if none of them ever finds herself taking a math test in her swimsuit.
Manipulation of the Independent Variable
Again, to manipulate an independent variable means to change its level systematically so that different groups of participants are exposed to different levels of that variable, or the same group of participants is exposed to different levels at different times. For example, to see whether expressive writing affects people’s health, a researcher might instruct some participants to write about traumatic experiences and others to write about neutral experiences. The different levels of the independent variable are referred to as conditions , and researchers often give the conditions short descriptive names to make it easy to talk and write about them. In this case, the conditions might be called the “traumatic condition” and the “neutral condition.”
Notice that the manipulation of an independent variable must involve the active intervention of the researcher. Comparing groups of people who differ on the independent variable before the study begins is not the same as manipulating that variable. For example, a researcher who compares the health of people who already keep a journal with the health of people who do not keep a journal has not manipulated this variable and therefore not conducted an experiment. This is important because groups that already differ in one way at the beginning of a study are likely to differ in other ways too. For example, people who choose to keep journals might also be more conscientious, more introverted, or less stressed than people who do not. Therefore, any observed difference between the two groups in terms of their health might have been caused by whether or not they keep a journal, or it might have been caused by any of the other differences between people who do and do not keep journals. Thus the active manipulation of the independent variable is crucial for eliminating the third-variable problem.
Of course, there are many situations in which the independent variable cannot be manipulated for practical or ethical reasons and therefore an experiment is not possible. For example, whether or not people have a significant early illness experience cannot be manipulated, making it impossible to do an experiment on the effect of early illness experiences on the development of hypochondriasis. This does not mean it is impossible to study the relationship between early illness experiences and hypochondriasis—only that it must be done using nonexperimental approaches. We will discuss this in detail later in the book.
In many experiments, the independent variable is a construct that can only be manipulated indirectly. For example, a researcher might try to manipulate participants’ stress levels indirectly by telling some of them that they have five minutes to prepare a short speech that they will then have to give to an audience of other participants. In such situations, researchers often include a manipulation check in their procedure. A manipulation check is a separate measure of the construct the researcher is trying to manipulate. For example, researchers trying to manipulate participants’ stress levels might give them a paper-and-pencil stress questionnaire or take their blood pressure—perhaps right after the manipulation or at the end of the procedure—to verify that they successfully manipulated this variable.
Control of Extraneous Variables
An extraneous variable is anything that varies in the context of a study other than the independent and dependent variables. In an experiment on the effect of expressive writing on health, for example, extraneous variables would include participant variables (individual differences) such as their writing ability, their diet, and their shoe size. They would also include situation or task variables such as the time of day when participants write, whether they write by hand or on a computer, and the weather. Extraneous variables pose a problem because many of them are likely to have some effect on the dependent variable. For example, participants’ health will be affected by many things other than whether or not they engage in expressive writing. This can make it difficult to separate the effect of the independent variable from the effects of the extraneous variables, which is why it is important to control extraneous variables by holding them constant.
Extraneous Variables as “Noise”
Extraneous variables make it difficult to detect the effect of the independent variable in two ways. One is by adding variability or “noise” to the data. Imagine a simple experiment on the effect of mood (happy vs. sad) on the number of happy childhood events people are able to recall. Participants are put into a negative or positive mood (by showing them a happy or sad video clip) and then asked to recall as many happy childhood events as they can. The two leftmost columns of Table 6.1 “Hypothetical Noiseless Data and Realistic Noisy Data” show what the data might look like if there were no extraneous variables and the number of happy childhood events participants recalled was affected only by their moods. Every participant in the happy mood condition recalled exactly four happy childhood events, and every participant in the sad mood condition recalled exactly three. The effect of mood here is quite obvious. In reality, however, the data would probably look more like those in the two rightmost columns of Table 6.1 “Hypothetical Noiseless Data and Realistic Noisy Data” . Even in the happy mood condition, some participants would recall fewer happy memories because they have fewer to draw on, use less effective strategies, or are less motivated. And even in the sad mood condition, some participants would recall more happy childhood memories because they have more happy memories to draw on, they use more effective recall strategies, or they are more motivated. Although the mean difference between the two groups is the same as in the idealized data, this difference is much less obvious in the context of the greater variability in the data. Thus one reason researchers try to control extraneous variables is so their data look more like the idealized data in Table 6.1 “Hypothetical Noiseless Data and Realistic Noisy Data” , which makes the effect of the independent variable is easier to detect (although real data never look quite that good).
Table 6.1 Hypothetical Noiseless Data and Realistic Noisy Data
One way to control extraneous variables is to hold them constant. This can mean holding situation or task variables constant by testing all participants in the same location, giving them identical instructions, treating them in the same way, and so on. It can also mean holding participant variables constant. For example, many studies of language limit participants to right-handed people, who generally have their language areas isolated in their left cerebral hemispheres. Left-handed people are more likely to have their language areas isolated in their right cerebral hemispheres or distributed across both hemispheres, which can change the way they process language and thereby add noise to the data.
In principle, researchers can control extraneous variables by limiting participants to one very specific category of person, such as 20-year-old, straight, female, right-handed, sophomore psychology majors. The obvious downside to this approach is that it would lower the external validity of the study—in particular, the extent to which the results can be generalized beyond the people actually studied. For example, it might be unclear whether results obtained with a sample of younger straight women would apply to older gay men. In many situations, the advantages of a diverse sample outweigh the reduction in noise achieved by a homogeneous one.
Extraneous Variables as Confounding Variables
The second way that extraneous variables can make it difficult to detect the effect of the independent variable is by becoming confounding variables. A confounding variable is an extraneous variable that differs on average across levels of the independent variable. For example, in almost all experiments, participants’ intelligence quotients (IQs) will be an extraneous variable. But as long as there are participants with lower and higher IQs at each level of the independent variable so that the average IQ is roughly equal, then this variation is probably acceptable (and may even be desirable). What would be bad, however, would be for participants at one level of the independent variable to have substantially lower IQs on average and participants at another level to have substantially higher IQs on average. In this case, IQ would be a confounding variable.
To confound means to confuse, and this is exactly what confounding variables do. Because they differ across conditions—just like the independent variable—they provide an alternative explanation for any observed difference in the dependent variable. Figure 6.1 “Hypothetical Results From a Study on the Effect of Mood on Memory” shows the results of a hypothetical study, in which participants in a positive mood condition scored higher on a memory task than participants in a negative mood condition. But if IQ is a confounding variable—with participants in the positive mood condition having higher IQs on average than participants in the negative mood condition—then it is unclear whether it was the positive moods or the higher IQs that caused participants in the first condition to score higher. One way to avoid confounding variables is by holding extraneous variables constant. For example, one could prevent IQ from becoming a confounding variable by limiting participants only to those with IQs of exactly 100. But this approach is not always desirable for reasons we have already discussed. A second and much more general approach—random assignment to conditions—will be discussed in detail shortly.
Figure 6.1 Hypothetical Results From a Study on the Effect of Mood on Memory
Because IQ also differs across conditions, it is a confounding variable.
Key Takeaways
- An experiment is a type of empirical study that features the manipulation of an independent variable, the measurement of a dependent variable, and control of extraneous variables.
- Studies are high in internal validity to the extent that the way they are conducted supports the conclusion that the independent variable caused any observed differences in the dependent variable. Experiments are generally high in internal validity because of the manipulation of the independent variable and control of extraneous variables.
- Studies are high in external validity to the extent that the result can be generalized to people and situations beyond those actually studied. Although experiments can seem “artificial”—and low in external validity—it is important to consider whether the psychological processes under study are likely to operate in other people and situations.
- Practice: List five variables that can be manipulated by the researcher in an experiment. List five variables that cannot be manipulated by the researcher in an experiment.
Practice: For each of the following topics, decide whether that topic could be studied using an experimental research design and explain why or why not.
- Effect of parietal lobe damage on people’s ability to do basic arithmetic.
- Effect of being clinically depressed on the number of close friendships people have.
- Effect of group training on the social skills of teenagers with Asperger’s syndrome.
- Effect of paying people to take an IQ test on their performance on that test.
Cialdini, R. (2005, April). Don’t throw in the towel: Use social influence research. APS Observer . Retrieved from http://www.psychologicalscience.org/observer/getArticle.cfm?id=1762 .
Fredrickson, B. L., Roberts, T.-A., Noll, S. M., Quinn, D. M., & Twenge, J. M. (1998). The swimsuit becomes you: Sex differences in self-objectification, restrained eating, and math performance. Journal of Personality and Social Psychology, 75 , 269–284.
Stanovich, K. E. (2010). How to think straight about psychology (9th ed.). Boston, MA: Allyn & Bacon.
Research Methods in Psychology Copyright © 2016 by University of Minnesota is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.
Experimental Method In Psychology
Saul McLeod, PhD
Editor-in-Chief for Simply Psychology
BSc (Hons) Psychology, MRes, PhD, University of Manchester
Saul McLeod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.
Learn about our Editorial Process
Olivia Guy-Evans, MSc
Associate Editor for Simply Psychology
BSc (Hons) Psychology, MSc Psychology of Education
Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.
On This Page:
The experimental method involves the manipulation of variables to establish cause-and-effect relationships. The key features are controlled methods and the random allocation of participants into controlled and experimental groups .
What is an Experiment?
An experiment is an investigation in which a hypothesis is scientifically tested. An independent variable (the cause) is manipulated in an experiment, and the dependent variable (the effect) is measured; any extraneous variables are controlled.
An advantage is that experiments should be objective. The researcher’s views and opinions should not affect a study’s results. This is good as it makes the data more valid and less biased.
There are three types of experiments you need to know:
1. Lab Experiment
A laboratory experiment in psychology is a research method in which the experimenter manipulates one or more independent variables and measures the effects on the dependent variable under controlled conditions.
A laboratory experiment is conducted under highly controlled conditions (not necessarily a laboratory) where accurate measurements are possible.
The researcher uses a standardized procedure to determine where the experiment will take place, at what time, with which participants, and in what circumstances.
Participants are randomly allocated to each independent variable group.
Examples are Milgram’s experiment on obedience and Loftus and Palmer’s car crash study .
- Strength : It is easier to replicate (i.e., copy) a laboratory experiment. This is because a standardized procedure is used.
- Strength : They allow for precise control of extraneous and independent variables. This allows a cause-and-effect relationship to be established.
- Limitation : The artificiality of the setting may produce unnatural behavior that does not reflect real life, i.e., low ecological validity. This means it would not be possible to generalize the findings to a real-life setting.
- Limitation : Demand characteristics or experimenter effects may bias the results and become confounding variables .
2. Field Experiment
A field experiment is a research method in psychology that takes place in a natural, real-world setting. It is similar to a laboratory experiment in that the experimenter manipulates one or more independent variables and measures the effects on the dependent variable.
However, in a field experiment, the participants are unaware they are being studied, and the experimenter has less control over the extraneous variables .
Field experiments are often used to study social phenomena, such as altruism, obedience, and persuasion. They are also used to test the effectiveness of interventions in real-world settings, such as educational programs and public health campaigns.
An example is Holfing’s hospital study on obedience .
- Strength : behavior in a field experiment is more likely to reflect real life because of its natural setting, i.e., higher ecological validity than a lab experiment.
- Strength : Demand characteristics are less likely to affect the results, as participants may not know they are being studied. This occurs when the study is covert.
- Limitation : There is less control over extraneous variables that might bias the results. This makes it difficult for another researcher to replicate the study in exactly the same way.
3. Natural Experiment
A natural experiment in psychology is a research method in which the experimenter observes the effects of a naturally occurring event or situation on the dependent variable without manipulating any variables.
Natural experiments are conducted in the day (i.e., real life) environment of the participants, but here, the experimenter has no control over the independent variable as it occurs naturally in real life.
Natural experiments are often used to study psychological phenomena that would be difficult or unethical to study in a laboratory setting, such as the effects of natural disasters, policy changes, or social movements.
For example, Hodges and Tizard’s attachment research (1989) compared the long-term development of children who have been adopted, fostered, or returned to their mothers with a control group of children who had spent all their lives in their biological families.
Here is a fictional example of a natural experiment in psychology:
Researchers might compare academic achievement rates among students born before and after a major policy change that increased funding for education.
In this case, the independent variable is the timing of the policy change, and the dependent variable is academic achievement. The researchers would not be able to manipulate the independent variable, but they could observe its effects on the dependent variable.
- Strength : behavior in a natural experiment is more likely to reflect real life because of its natural setting, i.e., very high ecological validity.
- Strength : Demand characteristics are less likely to affect the results, as participants may not know they are being studied.
- Strength : It can be used in situations in which it would be ethically unacceptable to manipulate the independent variable, e.g., researching stress .
- Limitation : They may be more expensive and time-consuming than lab experiments.
- Limitation : There is no control over extraneous variables that might bias the results. This makes it difficult for another researcher to replicate the study in exactly the same way.
Key Terminology
Ecological validity.
The degree to which an investigation represents real-life experiences.
Experimenter effects
These are the ways that the experimenter can accidentally influence the participant through their appearance or behavior.
Demand characteristics
The clues in an experiment lead the participants to think they know what the researcher is looking for (e.g., the experimenter’s body language).
Independent variable (IV)
The variable the experimenter manipulates (i.e., changes) is assumed to have a direct effect on the dependent variable.
Dependent variable (DV)
Variable the experimenter measures. This is the outcome (i.e., the result) of a study.
Extraneous variables (EV)
All variables which are not independent variables but could affect the results (DV) of the experiment. EVs should be controlled where possible.
Confounding variables
Variable(s) that have affected the results (DV), apart from the IV. A confounding variable could be an extraneous variable that has not been controlled.
Random Allocation
Randomly allocating participants to independent variable conditions means that all participants should have an equal chance of participating in each condition.
The principle of random allocation is to avoid bias in how the experiment is carried out and limit the effects of participant variables.
Order effects
Changes in participants’ performance due to their repeating the same or similar test more than once. Examples of order effects include:
(i) practice effect: an improvement in performance on a task due to repetition, for example, because of familiarity with the task;
(ii) fatigue effect: a decrease in performance of a task due to repetition, for example, because of boredom or tiredness.
Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.
- View all journals
- Explore content
- About the journal
- Publish with us
- Sign up for alerts
- Published: 09 December 2024
Applied research is the path to legitimacy in psychological science
- Judith P. Andersen ORCID: orcid.org/0000-0001-8477-1432 1
Nature Reviews Psychology ( 2024 ) Cite this article
76 Accesses
17 Altmetric
Metrics details
- Communication
- Human behaviour
Psychology is founded on studies that claim generalizability despite being conducted in artificial settings with non-representative samples. The failure to test the robustness of scientific findings in applied settings casts doubt on the legitimacy of psychology.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 digital issues and online access to articles
55,14 € per year
only 4,60 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Cheon, B. K., Melani, I. & Hong, Y. Y. How USA-centric is psychology? An archival study of implicit assumptions of generalizability of findings to human nature based on origins of study samples. Soc. Psychol. Personal. Sci. 11 , 928–937 (2020).
Article Google Scholar
Lewis, N. A. Jr. What counts as good science? How the battle for methodological legitimacy affects public psychology. Am. Psychol. 76 , 1323–1333 (2021).
Article PubMed Google Scholar
Simons, D. J., Shoda, Y. & Lindsay, D. S. Constraints on generality (COG): a proposed addition to all empirical papers. Perspect. Psychol. Sci. 12 , 1123–1128 (2017).
Shrout, P. E. & Rodgers, J. L. Psychology, science, and knowledge construction: broadening perspectives from the replication crisis. Annu. Rev. Psychol. 69 , 487–510 (2018).
Roberts, S. O., Bareket-Shavit, C., Dollins, F. A., Goldie, P. D. & Mortenson, E. Racial inequality in psychological research: trends of the past and recommendations for the future. Perspect. Psychol. Sci. 15 , 1295–1309 (2020).
Kelly, C. D. Replicating empirical research in behavioral ecology: how and why it should be done but rarely ever is. Q. Rev. Biol. 81 , 221–236 (2006).
Vazire, S. Implications of the credibility revolution for productivity, creativity, and progress. Perspect. Psychol. Sci. 13 , 411–417 (2018).
Hagiwara, N., Kron, F. W., Scerbo, M. W. & Watson, G. S. A call for grounding implicit bias training in clinical and translational frameworks. Lancet 395 , 1457–1460 (2020).
Article PubMed PubMed Central Google Scholar
Liu, F., Holme, P., Chiesa, M., AlShebli, B. & Rahwan, T. Gender inequality and self-publication are common among academic editors. Nat. Hum. Behav. 7 , 353–364 (2023).
Heidt, A. Racial inequalities in journals highlighted in giant study. Nature https://doi.org/10.1038/d41586-023-01457-4 (2023).
Download references
Acknowledgements
The references cited in this article have greatly influenced its development. I thank my current and former graduate students, colleagues, mentors, and members of the HART Lab for valuable discussions on applied research and the topic of this paper.
Author information
Authors and affiliations.
Department of Psychology, University of Toronto at Mississauga, Mississauga, Ontario, Canada
Judith P. Andersen
You can also search for this author in PubMed Google Scholar
Contributions
Positionality statement.
I am a health psychologist with expertise in ambulatory psychophysiology research, focusing on trauma, health and resilience in populations exposed to severe and chronic stress, particularly among LGBTQ+ individuals and law enforcement personnel. I have a strong preference for applied research, as I believe that gaining a comprehensive understanding of cognition and behaviour requires the collection of diverse data types, including socio-cultural, biological, psychosocial and environmental factors, measured within their relevant contexts. I acknowledge that psychological research is shaped by a researcher’s positionality. Accordingly, I am aware that my beliefs, research inquiries and methodological preferences are influenced by my bio-socio-cultural background, including but not limited to, being a white, North American, queer, neurodivergent, cisgender woman. The reality that research reflects positionality underscores the importance of advancing applied research conducted by individuals from under-represented communities, particularly those who are racialized and otherwise marginalized.
Corresponding author
Correspondence to Judith P. Andersen .
Ethics declarations
Competing interests.
The author declares no competing interests.
Rights and permissions
Reprints and permissions
About this article
Cite this article.
Andersen, J.P. Applied research is the path to legitimacy in psychological science. Nat Rev Psychol (2024). https://doi.org/10.1038/s44159-024-00388-9
Download citation
Published : 09 December 2024
DOI : https://doi.org/10.1038/s44159-024-00388-9
Share this article
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Quick links
- Explore articles by subject
- Guide to authors
- Editorial policies
Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.
An official website of the United States government
Official websites use .gov A .gov website belongs to an official government organization in the United States.
Secure .gov websites use HTTPS A lock ( Lock Locked padlock icon ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.
- Publications
- Account settings
- Advanced Search
- Journal List
The ‘Real-World Approach’ and Its Problems: A Critique of the Term Ecological Validity
Gijs a holleman, ignace t c hooge, chantal kemner, roy s hessels.
- Author information
- Article notes
- Copyright and License information
Edited by: Matthias Gamer, Julius Maximilian University of Würzburg, Germany
Reviewed by: Yoni Pertzov, The Hebrew University of Jerusalem, Israel; Nicola Jean Gregory, Bournemouth University, United Kingdom
*Correspondence: Gijs A. Holleman, [email protected]
This article was submitted to Cognitive Science, a section of the journal Frontiers in Psychology
Received 2020 Jan 24; Accepted 2020 Mar 25; Collection date 2020.
This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
A popular goal in psychological science is to understand human cognition and behavior in the ‘real-world.’ In contrast, researchers have typically conducted their research in experimental research settings, a.k.a. the ‘psychologist’s laboratory.’ Critics have often questioned whether psychology’s laboratory experiments permit generalizable results. This is known as the ‘real-world or the lab’-dilemma. To bridge the gap between lab and life, many researchers have called for experiments with more ‘ecological validity’ to ensure that experiments more closely resemble and generalize to the ‘real-world.’ However, researchers seldom explain what they mean with this term, nor how more ecological validity should be achieved. In our opinion, the popular concept of ecological validity is ill-formed, lacks specificity, and falls short of addressing the problem of generalizability. To move beyond the ‘real-world or the lab’-dilemma, we believe that researchers in psychological science should always specify the particular context of cognitive and behavioral functioning in which they are interested, instead of advocating that experiments should be more ‘ecologically valid’ in order to generalize to the ‘real-world.’ We believe this will be a more constructive way to uncover the context-specific and context-generic principles of cognition and behavior.
Keywords: ecological validity, experiments, real-world approach, generalizability, definitions
Introduction
A popular goal in psychological science is to understand human cognition and behavior in the ‘real-world.’ In contrast, researchers have traditionally conducted experiments in specialized research settings, a.k.a. the ‘psychologist’s laboratory’ ( Danziger, 1994 ; Hatfield, 2002 ). Over the course of psychology’s history, critics have often questioned whether psychology’s lab-based experiments permit the generalization of results beyond the laboratory settings within which these results are typically obtained. In response, many researchers have advocated for more ‘ecologically valid’ experiments, as opposed to the so-called ‘conventional’ laboratory methods ( Neisser, 1976 ; Aanstoos, 1991 ; Kingstone et al., 2008 ; Shamay-Tsoory and Mendelsohn, 2019 ; Osborne-Crowley, 2020 ). In recent years, several technological advances (e.g., virtual reality, wearable eye trackers, mobile EEG devices, fNIRS, biosensors, etc.) have further galvanized researchers to emphasize the importance of studying human cognition and behavior in the ‘real-world,’ as new technologies will aid researchers in overcoming some of the inherent limitations of laboratory experiments ( Schilbach, 2015 ; Shamay-Tsoory and Mendelsohn, 2019 ; Sonkusare et al., 2019 ).
In this article, we will argue that the general aspiration of researchers to understand human cognition and behavior in the ‘real-world’ by conducting experiments that are more ‘ecologically valid’ (henceforth referred to as the ‘real-world approach’) is not without its problems. Most notably, we will argue that the popular term ‘ecological validity,’ which is widely used nowadays by researchers to discuss whether experimental research resembles and generalizes to the ‘real-world,’ is shrouded in both conceptual and methodological confusion. As we ourselves are interested in cognitive and behavioral functioning in the context of people’s everyday experience, and conduct experiments across various ‘laboratory’ and ‘real-world’ environments, we have seen how the uncritical use of the term ‘ecological validity’ can lead to rather misleading and counterproductive discussions. This not only holds for how this concept is used in many scholarly articles and textbooks, but also in presentations and discussions of experimental research at conferences, during the review process, and when talking with students about experimental design and the analysis of evidence.
Although the usage of the term ecological validity has previously been criticized by several scholars ( Hammond, 1998 ; Schmuckler, 2001 ; cf. Araujo et al., 2007 ; Dunlosky et al., 2009 ), we think that these critiques have largely been overlooked. Therefore, it will be necessary to cover some of the same ground. The contribution of this article is threefold. First, we extend the critique of the term ecological validity and apply it to the field of social attention. Second, we scrutinize some of the assumptions that guide the contemporary framework of ecological validity, specifically those regarding artificiality–naturality and simplicity–complexity. Finally, our article is meant to educate a new generation of students and researchers on the historical roots and conceptual issues of the term ecological validity. This article consists of four parts. First, we will provide a brief history of the so-called ‘real-world or the lab’-dilemma and discuss several definitions and interpretations of the term ecological validity. Second, we will go into the historical roots of the concept of ecological validity and describe how the original meaning of this concept has transformed significantly. Third, we will scrutinize the prevailing assumptions that seems to guide how researchers are currently using the term ecological validity. Finally, we will apply our conceptual analysis to a specific field of study, namely the field of social attention. In recent years, this field has been particularly concerned with issues of ecological validity and generalizability. Therefore, the field of social attention offers an exemplary case to explain how the uncritical use of the terms ‘ecological validity’ and the ‘real-world’ may lead to misleading and counterproductive conclusions.
A Brief History of the ‘Real-World or the Lab’-Dilemma
The popular story of psychology (or the broader ‘cognitive sciences’) has it that “psychology became a science by rising from the ‘armchair’ of speculation and uncontrolled observation, and entering the laboratory to undertake controlled observation and measurement” ( Hatfield, 2002 , p. 208). The ‘psychologist’s laboratory’, a special room furnished with all kinds of lab paraphernalia and sophisticated equipment, has been regarded as the celebrated vehicle of psychology’s journey into sciencehood ( Danziger, 1994 ; Goodwin, 2015 ). However, despite psychologists’ long tradition of laboratory experimentation (for a history and discussion, see Gillis and Schneider, 1966 ), there also have been many critical voices saying that psychology’s laboratory experiments are too limited in scope to study how people function in daily life. For example, Brunswik (1943 , p. 262) once wrote that experimental psychology was limited to “narrow-spanning problems of artificially isolated proximal or peripheral technicalities of mediation which are not representative of the larger patterns of life”. Barker (1968 , p. 3) wrote that “it is impossible to create in the laboratory the frequency, duration, scope and magnitude of some important human conditions.” Neisser (1976 , p. 34) wrote that “contemporary studies of cognitive processes usually use stimulus material that is abstract, discontinuous, and only marginally real.” Bronfenbrenner (1977 , p. 513) wrote that “many of these experiments involve situations that are unfamiliar, artificial, and short-lived and that call for unusual behaviors that are difficult to generalize to other settings.” Kingstone et al. (2008 , p. 355) declared that “the research performed in labs, and the findings they generate, are in principle and in practice unlikely to be of relevance to the more complex situations that people experience in everyday lif e, ” and Shamay-Tsoory and Mendelsohn (2019 , p. 1) stated that “ conventional experimental psychological approaches have mainly focused on investigating behavior of individuals as isolated agents situated in artificial, sensory, and socially deprived environments, limiting our understanding of naturalistic cognitive, emotional, and social phenomena.”
According to these scholars, psychological science is faced with a gloomy predicament: findings and results based on highly controlled and systematically designed laboratory experiments may not be a great discovery but only a “mere laboratory curiosity” ( Gibson, 1970 , pp. 426–427). As Anderson et al. (1999 , p. 3) put it: “A common truism has been that … laboratory studies are good at telling whether or not some manipulation of an independent variable causes changes in the dependent variable, but many scholars assume that these results do not generalize to the “real-world.” The general concern is that, due to the ‘artificiality’ and ‘simplicity’ of the laboratory, some (if not many) lab-based experiments do not adequately represent the ‘naturality’ and ‘complexity’ of psychological phenomena in everyday life (see Figure 1 ). This problem has become familiar to psychologists as the ‘real-world or the lab’-dilemma ( Hammond and Stewart, 2001 ). At the heart of psychology’s ‘real-world or the lab’-dilemma lies a pernicious methodological choice: “Should it [psychological science] pursue the goal of generality by demanding that research be generalizable to “real life” (aka the “real-world”), or should it pursue generalizability by holding onto its traditional laboratory research paradigm?” ( Hammond and Stewart, 2001 , p. 7).
Examples of historical and contemporary laboratory rooms and field experiments. (A) A laboratory room from the early 20th century. A participant is seated in front a ‘disc tachistoscope,’ an apparatus to display visual images (adapted from Hilton, 1920 ). (B) A picture of a field experiment by J. J. Gibson. Observers had to judge the size of an object in the distance (adapted from Gibson, 1950 ). (C) A 21st century eye tracking laboratory. A participant is seated in front of a SMI Hi-Speed tower-mounted eye tracker (based on Valtakari et al., 2019 ). (D) A wearable eye-tracker (barely visible) is used to measure gaze behavior while participants walked through corridors with human crowds ( Hessels et al., 2020 ). Copyright statement – Panels (A,B) . All photographs are used under the provision of the “fair use” U.S. Copyright Act 107 and Dutch Copyright Law Article 15a for non-profit purposes of research, education and scholarly comment. The photograph from W. Hilton’s book: Applied Psychology: Driving Power of Thought (Original date of publication, 1920). Retrieved April 1, 2020, from http://www.gutenberg.org/files/33076/33076-h/33076-h.htm . The photograph from J. J. Gibson’s book: The Perception of the Visual World (Original date of publication, 1950, Figure 74, p. 184) was retrieved from a copy of the Utrecht University library. (C,D) Photographs are owned by the authors and the people depicted in the images gave consent for publication.
Although psychological science is comprised of many specialized research areas, the goal to understand human cognition and behavior in the ‘real-world’ has become a critically acclaimed goal for psychologists and cognitive scientists of all stripes. Indeed, examples of the ‘real-world or the lab’-dilemma can be found not only in various ‘applied’ fields of psychology, such as ergonomics ( Hoc, 2001 ), clinical (neuro)psychology ( Wilson, 1993 ; Parsons, 2015 ), educational psychology ( Dunlosky et al., 2009 ), sport psychology ( Davids, 1988 ), marketing and consumer psychology ( Smith et al., 1998 ), and the psychology of driving ( Rogers et al., 2005 ), but also in the so-called ‘basic’ fields of psychological science, such as the study of perception ( Brunswik, 1956 ; Gibson, 1979/2014 ), attention ( Simons and Levin, 1998 ; Peelen and Kastner, 2014 ), memory ( Banaji and Crowder, 1989 ; Neisser, 1991 ; Cohen and Conway, 2007 ), social cognition ( Schilbach et al., 2013 ; Schilbach, 2015 ; Shamay-Tsoory and Mendelsohn, 2019 ; Osborne-Crowley, 2020 ), judgment-and-decision making ( Koehler, 1996 ), and child development ( Lewkowicz, 2001 ; Schmuckler, 2001 ; Adolph, 2019 ).
The ‘Real-World Approach’: A Call for Ecological Validity
In the past decades, researchers have often discussed how they may overcome some of the limitations of laboratory-based experiments. Perhaps the largest common denominator of what we call the ‘real-world approach’ is a strong emphasis on ‘ecological validity.’ Over the past decades, the term ecological validity has made its appearance whenever researchers became concerned with the potential limitations of laboratory experiments (see e.g., Jenkins, 1974 ; Neisser, 1976 ; Banaji and Crowder, 1989 ; Aanstoos, 1991 ; Koehler, 1996 ; Smilek et al., 2006 ; Risko et al., 2012 ; Schilbach, 2015 ; Caruana et al., 2017 ; Shamay-Tsoory and Mendelsohn, 2019 ; Osborne-Crowley, 2020 ). As Neisser (1976 , p. 33) famously put it:
“The concept of ecological validity has become familiar to psychologists. It reminds them that the artificial situation created for an experiment may differ from the everyday world in crucial ways. When this is so, the results may be irrelevant to the phenomena that one would really like to explain.”
The main problem, according to Neisser and many others, is that experiments in psychological science are generally “lacking in ecological validity” ( Neisser, 1976 , p. 7; Smilek et al., 2006 ; Shamay-Tsoory and Mendelsohn, 2019 ; Sonkusare et al., 2019 ). Aanstoos (1991 , p. 77) even referred to this problem as the “ecological validity crisis.” To counter this problem, many researchers have called for studies with ‘more’ or ‘greater’ ecological validity. For example, Koehler (1996 , p. 1) advocated for a “more ecologically valid research program,” Schilbach (2015 , p. 130) argued for “the inclusion of more ecologically valid conditions,” and Smilek et al. (2006 , p. 104) suggested that “in order for results to generalize to real-world scenarios we need to use tasks with greater ecological validity.” Clearly, ecological validity is regarded as an important feature of experimental research by researchers who pursue the ‘real-world approach.’ However, in our opinion, and we are not alone in this regard (see also Hammond, 1998 ; Araujo et al., 2007 ; Dunlosky et al., 2009 ), this notion of ecological validity has caused considerable confusion. To foreshadow some of our criticism of ecological validity, we will show that this concept has largely been detached from its original parentage (cf. Brunswik, 1949 ), and is now host to different interpretations guided by questionable assumptions (for a history, see Hammond, 1998 ). Worst of all, the concept is often wielded as a blunt weapon to criticize and dismiss experiments, even though researchers seldom make explicit what definition of ecological validity they use or by which set of criteria they have evaluated a study’s ecological validity (as previously pointed out by Hammond, 1998 ; Schmuckler, 2001 ; Dunlosky et al., 2009 ).
The Big Umbrella of Ecological Validity
In past decades, the concept of ecological validity has been related to various facets of psychological research, for example, the ecological validity of stimuli ( Neisser, 1976 ; Risko et al., 2012 ; Jack and Schyns, 2017 ), the ecological validity of tasks ( Smilek et al., 2006 ; Krakauer et al., 2017 ), the ecological validity of conditions ( Schilbach, 2015 ; Blanco-Elorrieta and Pylkkänen, 2018 ), the ecological validity of research settings ( Bronfenbrenner, 1977 ; Schmuckler, 2001 ), the ecological validity of results ( Eaton and Clore, 1975 ; Greenwald, 1976 ; Silverstein and Stang, 1976 ), the ecological validity of theories ( Neisser, 1976 ), the ecological validity of research designs ( Rogers et al., 2005 ), the ecological validity of methods ( Banaji and Crowder, 1989 ), the ecological validity of phenomena ( Johnston et al., 2014 ), the ecological validity of data ( Aspland and Gardner, 2003 ), and the ecological validity of paradigms ( Macdonald and Tatler, 2013 ; Schilbach et al., 2013 ). However, despite the popular usage of this term, specific definitions and requirements of ecological validity are not always clear.
A closer look at the literature suggests that different definitions and interpretations are used by researchers. Let’s consider some examples of the literature where researchers have been more explicit in their definitions of ecological validity. For example, Ashcraft and Radvansky (2009 , p. 511) defined ecological validity as: “The hotly debated principle that research must resemble the situations and task demands that are characteristic of the real-world rather than rely on artificial laboratory settings and tasks so that results will generalize to the real-world, that is, will have ecological validity.” Another influential definition of ecological validity was given by Bronfenbrenner (1977) , who defined ecological validity as “the extent to which the environment experienced by the subjects in a scientific investigation has the properties it is supposed or assumed to have by the investigator” (p. 516). In Bronfenbrenner’s view, a study’s ecological validity should not be predicated on the extent to which the research context resembles or is carried out in a ‘real-life’ environment. Instead, theoretical considerations should guide one’s methodological decisions on what type of research context is most appropriate given one’s focus of inquiry. For example, if one is interested in the behavioral responses of children when they are placed in a ‘strange situation’ then a laboratory room may be adequately suited for that particular research goal. However, if one is interested in how children behave within their home environment, then a laboratory room may not be the most suitable research context. As Bronfenbrenner (1977 , p. 516) remarked: “Specifically, so far as young children are concerned, the results indicate that the strangeness of the laboratory situation tends to increase anxiety and other negative feeling states and to decrease manifestations of social competence.”
Ecological validity has also been used interchangeably with (or regarded as a necessary component of) ‘external validity’ ( Berkowitz and Donnerstein, 1982 ; Mook, 1983 ; Hoc, 2001 ). The concept of external validity typically refers to whether a given study result or conclusion, usually obtained under one set of conditions and with one group of participants, can also be generalized to other people, tasks, and situations ( Campbell, 1957 ). For example, in the literature on neuropsychological assessment and rehabilitation, ecological validity has primarily been conceptualized as “ … the degree to which clinical tests of cognitive functioning predict functional impairment” ( Higginson et al., 2000 , p. 185). In this field, there has been much discussion about whether the neuropsychological tests used by clinicians accurately predict cognitive and behavioral impairments in everyday life ( Heinrichs, 1990 ; Wilson, 1993 ). One major concern is that the test materials are either too abstract or too general to adequately represent the kind of problems that people with cognitive and neurological impairments encounter in their daily routines, for example, while cooking or buying food at the supermarket. In response, various efforts have been made to increase the ecological validity of neuropsychological tests, for example, by developing performance measures with relevance for everyday tasks and activities ( Shallice and Burgess, 1991 ; Alderman et al., 2003 ), by combining and correlating tests results with behavioral observations and self-reports ( Wilson, 1993 ; Higginson et al., 2000 ), and by using Virtual Reality (VR) applications to create test situations in which a patient’s cognitive and functional impairments are likely to be expressed ( Parsons, 2015 ; Parsons et al., 2017 ).
The Historical Roots of Ecological Validity
As we have seen, definitions and interpretations of ecological validity may not only differ among researchers, but also across various subfields within psychology. As such, it is not always clear how the concept should be interpreted. Interestingly, the term ecological validity used to have a very precise meaning when it was first introduced to psychological science by Brunswik (1949 , 1952 , 1955 , 1956) . Brunswik coined the term ‘ecological validity’ to describe the correlation between a proximal sensory cue (e.g., retinal stimulation) and a distal object-variable (e.g., object in the environment). In Brunswik’s terminology, ecological validity refers to a measure (a correlation coefficient) that describes a probabilistic relationship between the distal and proximal layers of an organism-environment system. According to Brunswik (1955) : “A correlation between ecological variables, one which is capable of standing in this manner as a probability cue for the other, may thus be labeled “ecological validity”” (p. 199). Brunswik (1952) believed psychology to primarily be a science of organism-environment relations in which the “organism has to cope with an environment full of uncertainties” (p. 22). In Brunswik’s ‘lens model’ ( Brunswik, 1952 ), the ecological validities of perceptual cues indicate the potential utility of these cues for the organism to achieve its behavioral goals. Note that Brunswik’s concept of ecological validity is very different from how the term is generally used nowadays, namely to discuss and evaluate whether some laboratory-based experiments resemble and generalize to the ‘real-world’ (cf. Neisser, 1976 ; Smilek et al., 2006 ; Ashcraft and Radvansky, 2009 ; Shamay-Tsoory and Mendelsohn, 2019 ).
The erosion and distortion of Brunswik’s definition of ecological validity has been documented by several scholars (e.g., Hammond, 1998 ; Araujo et al., 2007 ; Holleman et al., in press ). As explained by Hammond (1998) , the original definition of ecological validity, as Brunswik (1949 , 1952) introduced it, has been conflated with Brunswik’s ‘representative design’ of experiments ( Brunswik, 1955 , 1956 ). Representative design was Brunswik’s methodological program for psychological science to achieve generalizability of results. To achieve this, researchers should not only conduct proper sampling on the side of the subjects, by sampling subjects who are representative of a specific ‘target population’ (e.g., children, patients), but researchers should also sample stimuli, tasks, and situations which are representative of a specific ‘target ecology.’ As such, an experiment may be treated as a sample of this ‘target ecology.’ By virtue of sampling theory, researchers may then determine whether results can be generalized to the intended conditions. In short, representative design requires researchers to first specify the conditions toward which they intend to generalize their findings, and then specify how those conditions are represented in the experimental arrangement ( Brunswik, 1956 ). For more in-depth discussions on representative design, see Hammond and Stewart (2001) ; Dhami et al. (2004) , and Hogarth (2005) .
A Systematic Approach to Ecological Validity?
The current lack of terminological precision surrounding ecological validity is, to say the least, problematic. There seems to be no agreed upon definition in the literature, nor any means of classification to determine or evaluate a study’s ecological validity. This seems to be at odds with the relative ease by which researchers routinely invoke this concept to discuss the limitations and shortcomings of laboratory experiments. All the while, researchers seldom make clear how they have determined a study’s ecological (in)validity. As Schmuckler (2001 , p. 419) pointed out: “One consequence of this problem is that concerns with ecological validity can be raised in most experimental situations.” To overcome these problems, several scholars have emphasized the need for a more systematic approach to ecological validity ( Lewkowicz, 2001 ; Schmuckler, 2001 ; Kingstone et al., 2008 ; Risko et al., 2012 ). For example, Lewkowicz (2001 , p. 443) wrote that: “What is missing is an independent, objective, and operational definition of the concept of ecological validity that makes it possible to quantify a stimulus or event as more or less ecologically valid.” According to Schmuckler (2001) , ecological validity can be evaluated on at least three dimensions: (1) the nature of the stimuli ; (2) the nature of task, behavior, or response ; (3) the nature of the research context . Researchers have primarily discussed these dimensions in terms of their artificiality–naturality (e.g., Hoc, 2001 ; Schmuckler, 2001 ; Risko et al., 2012 ; Shamay-Tsoory and Mendelsohn, 2019 ; Sonkusare et al., 2019 ), and their simplicity–complexity (e.g., Kingstone et al., 2008 ; Peelen and Kastner, 2014 ; Lappi, 2015 ). As such, a general framework can be construed where stimuli, tasks, behaviors, and research contexts can be evaluated on a continuum of artificiality–naturality and simplicity–complexity (see also Risko et al., 2012 ; Lappi, 2015 ; Shamay-Tsoory and Mendelsohn, 2019 ; Osborne-Crowley, 2020 ). At one extreme is the laboratory, characterized by its artificiality and simplicity. At the other extreme is the ‘real-world,’ characterized by its naturality and complexity. According to this multidimensional framework, researchers may determine a study’s overall ecological validity by combining (e.g., averaging or summing) the main components of ecological validity (i.e., stimuli, tasks/behaviors, research context) in terms of their relative artificiality–naturality and simplicity–complexity. However, while many researchers have conceptualized ecological validity alongside these dimensions, we think there are several problems to consider. Since the dimensions of this framework are supposedly important to determine the ecological validity of experimental research, this then raises the question of how researchers can judge the artificiality–naturality and simplicity–complexity of particular experiments. This question will be explored in the following sections.
Artificiality – Naturality
The contrast between ‘artificiality’ and ‘naturality’ is a particularly prominent point of discussion in the ‘real-world or the lab’-dilemma and when researchers talk about the ecological validity of experimental research practices ( Hoc, 2001 ; Kingstone et al., 2008 ; Shamay-Tsoory and Mendelsohn, 2019 ). According to Hoc (2001 , pp. 282–283), ‘artificial’ situations are “those that are specifically designed for research” and ‘natural’ situations are “the target situations to be understood by research” . Importantly, Hoc (2001) notes that this distinction is made from the perspective of the researcher. However, this artificiality–naturality distinction should also be considered from the subject’s point of view. For example, according to Sonkusare et al. (2019) : “naturalistic paradigms can be heuristically defined as those that employ the rich, multimodal dynamic stimuli that represent our daily lived experience, such as film clips, TV advertisements, news items, and spoken narratives, or that embody relatively unconstrained interactions with other agents, gaming environments, or virtual realities” (p. 700). Furthermore, researchers have long recognized that artificiality arises when the experimental methods employed by researchers interfere with the naturality of the psychological phenomena one aims to study. Consequently, there is always an inherent trade-off between the degree of artificiality imposed by the experimental conditions and the naturality of the phenomena under scientific investigation ( Brunswik, 1956 ; Barker, 1968 ; Banaji and Crowder, 1989 ; Kingstone et al., 2008 ; Risko et al., 2012 ; Caruana et al., 2017 ). However, as Winograd (1988) has previously remarked, it remains difficult to “draw a line where artificiality ends and ecological validity … for real events begins” (p. 18).
Interestingly, discussions on the naturality–artificiality of experimental methods have a long pedigree in psychological science. By the end of the 19th century, Thorndike (1899) and Mills (1899) already argued fiercely about what methodology should be favored to study the behavior of cats. Mills dismissed Thorndike’s work because of the artificiality of the experimental methods employed by Thorndike (see Figure 2 ), whereas Thorndike regarded the ethological approach favored by Mills as a collection of uncritical observations and anecdotes. Mills (1899 , p. 264) wrote that: “Dr. Thorndike … has given the impression that I have not made experiments, or ‘crucial experiments’ … I may remark that a laboratory as ordinarily understood is not well suited for making psychological experiments on animals” . Mills’ point was that: “cats placed in small enclosures … cannot be expected to act naturally. Thus, nothing from about their normal behavior can be determined from their behavior in highly artificial, abnormal surroundings” ( Goodwin, 2015 , p. 200). In response to Mills, Thorndike (1899 , p. 414) replied: “Professor Mills does not argue in concrete terms, does not criticize concrete unfitness in the situations I devised for the animals. He simply names them unnatural.” Thorndike clearly did not accept Mills’ charge on the artificiality of his experimental arrangements to study the behavior of cats because Mills did not define what should be considered natural behavior in the first place.
A ‘puzzle box’ devised by Thorndike (1899 , 2017) to study learning behavior of cats. A hungry cat is placed in a box which can be opened if the cat pushes a latch. A food reward (‘positive reinforcer’) will be obtained by the cat if it figures out how to escape from the box. Thorndike discovered that after several trials, the time it takes the cat to escape from the box decreases. Experiments with puzzle boxes remain popular today to study the cognitive capacities of animals, for example, see Richter et al. (2016) for a study with octopuses. Copyright statement – Image created and owned by author IH and is based on E. L. Thorndike’s book: Animal Intelligence (Original date of publication, 1911, Figure 1, p. 30).
We think that this historical discussion between Thorndike and Mills is illuminating, because it characterizes the heart of the discussion on ecological validity nowadays. Namely, what exactly did Mills consider to be ‘natural’ or ‘normal’ behavior? And how did Mills determine that Thorndike’s experiments failed to capture the ‘natural’ behavior of cats? Following Thorndike’s point on the matter, we think that researchers cannot readily determine the naturality–artificiality of any given experimental arrangement, at least not without specifying what is entailed by these ascriptions. As Dunlosky et al. (2009 , p. 431) previously remarked: “A naturalistic setting guarantees nothing, especially given that “naturalistic” is never unpacked – what does it mean?”. Indeed, our survey of the literature also shows that the historical discussion between Thorndike and Mills is by no means a discussion of the past. In fact, we regularly encounter discussions on the ‘artificiality’ and ‘naturality’ of experimental setups, the presentation of stimuli, the behavior of participants, or the specific tasks and procedures used in experiments – not only in the literature, but also among our colleagues and reviewers. We must often ask for the specifics, because such remarks typically remain undefined by those who toss them around.
Simplicity – Complexity
The contemporary framework of ecological validity also posits that the laboratory and the ‘real-world’ are inversely proportional in terms of their simplicity–complexity. Many researchers have lamented that laboratory experiments have a ‘reductionistic’ tendency to simplify the complexity of the psychological phenomena under study (e.g., Neisser, 1976 ; Kingstone et al., 2008 ; Shamay-Tsoory and Mendelsohn, 2019 ; Sonkusare et al., 2019 ). For example, Sonkusare et al. (2019 , p. 699) stated that “the ecological validity of these abstract, laboratory-style experiments is debatable, as in many ways they do not resemble the complexity and dynamics of stimuli and behaviors in real-life.” But what exactly is meant by complexity? Let’s consider some examples from the literature. In the field of social attention, researchers have often used schematic images, photographs and videos of people and social scenes as stimuli to study the cognitive, behavioral, and physiological processes of face perception, gaze following and joint attention ( Langton et al., 2000 ; Frischen et al., 2007 ; Puce and Bertenthal, 2015 ). However, in recent years, there has been considerable debate that such stimuli are not ‘ecologically valid’ because they do not “capture the complexity of real social situations” ( Birmingham et al., 2012 , p. 30). While we agree that looking at a photographic image of a person’s face is different from looking at a living and breathing person, in what ways do these situations differ in complexity? Do these scholars mean that looking at a ‘live’ person is more complex than looking at a picture of that person? Or do they mean that the former is more complex than the latter from the perspective of the researcher who wants to understand the cognitive, behavioral, and physiological processes of face perception and social attention?
To take another example, Gabor patches are often used as stimuli by experimental psychologists to study ‘low-level visual processing’ (see Figure 3 ). Experimental psychologists use Gabor patches as visual stimuli because they offer a high degree of experimental control over various stimulus parameters (e.g., spatial frequency bandwidths, orientation, contrast, size, location). Gabor patches can described with mathematical precision (i.e., ”Gaussian-windowed sinusoidal gratings,” Fredericksen et al., 1997 , p. 1), and their spatial properties are considered to be a good representation of the receptive field profiles in the primary visual cortex. While Gabor patches may be considered ‘simple’ to researchers who study the relation between low-level visual processing and neural activity in terms of orientation-tuning and hemodynamic response functions, they also point to the yet to be explained ‘complexity’ of the many possible relations between other cognitive processes and patterns of neural activity in the brain. On the other hand, a naïve participant (who likely has no clue about what researchers have discovered about low-level visual processing) may describe these Gabor patches as blurry, kind of stripy, zebra-like circles, and think that they are incredibly boring to look at for many trials while lying quietly in a MRI scanner.
Are Gabor patches simple or complex compared to a picture of zebras? (A) A Gabor patch. (B) A photograph of zebras. The uniquely striped patterns of the zebra makes them most familiar to humans, whereas the question why zebras have such beautiful stripes remains the topic of much discussion among biologists, see e.g., Caro and Stankowich (2015) and Larison et al. (2015) . Copyright statement – Images are used under the provision of the “fair use” U.S. Copyright Act 107 and Dutch Copyright Law Article 15a for non-profit purposes of research, education and scholarly comment. Image of Gabor patch was adapted from Todorović (2016 , May 30). Retrieved April 1, 2020, from http://neuroanatody.com/2016/05/whats-in-a-gabor-patch/ ). Photograph of zebras was made by Ajay Lalu and has been made publicly available by the owner for non-profit purposes via Pixabay . Retrieved on April 1, 2020, from https://pixabay.com/nl/users/ajaylalu-1897335/ .
Our point here is that simplicity–complexity is in the eye of the beholder. Who is to say what is more simple or complex? Physicists, computer scientists, information theorists, and evolutionary biologists have developed various definitions and measures of complexity (e.g., physical complexity, computational complexity, effective complexity, algorithmic complexity, statistical complexity, structural complexity, functional complexity, etc.), typically expressed in strictly mathematical terms ( Edmonds, 1995 ; Gell-Mann, 1995 ; Adami, 2002 ). But what definitions and measures of complexity are used by psychologists and cognitive scientists? Researchers in psychological science seem to have more loosely used the term complexity, for example, to describe a wide range of biological, behavioral, cognitive, social, and cultural phenomena, which typically contain lots of many’s (i.e., many parts, many variables, many degrees of freedom). Researchers may refer to various phenomena as ‘complex’ because they are simply not (yet) understood, as in “the brain is too complex for us to understand” ( Edmonds, 1995 , p. 4). Yet, such intuitive notions of complexity, whether they are caused by ignorance or whether they are used to describe something’s size, number, or variety ( Edmonds, 1995 ), are not very helpful to evaluate the simplicity–complexity of stimuli, tasks, and situations, nor do such notions provide any formula by which these components can be summed to determine the total ecological validity of a given study. According to Gell-Mann (1995 , p. 16):
“As measures of something like complexity for an entity in the real-world, all such quantities are to some extent context-dependent or even subjective. They depend on the coarse graining (level of detail) of the description of the entity, on the previous knowledge and understanding of the world that is assumed, on the language employed, on the coding method used for conversion from that language into a string of bits, and on the particular idealized computer chosen as a standard.”
The ‘Real World’ or the ‘Laboratory’: Psychology’s False Dilemma?
We have discussed several problems with how researchers have used the term ‘ecological validity’. In short, the concept of ecological validity has transformed significantly over the past several decades since it was introduced by Brunswik (1949) . It has lost most of its former theoretical and methodological cohesion (for a history, see Hammond, 1998 ), and the definitions and requirements of ecological validity used by researchers nowadays are seldom made explicit. As such, some experiments may be regarded as ‘ecologically valid’ by one researcher while they can be casually dismissed as ‘ecologically invalid’ by others. A closer look at the literature suggests that many researchers seem to assume that everyone understands what is meant by this term, while in fact the concept of ecological validity is seldom defined. As such, the concept of ecological validity is primarily used nowadays to make hand-waving statements about whether some (lab-based) experiments resemble ‘real life,’ or whether some results obtained in the laboratory may or may not generalize to the ‘real-world.’
In our opinion, the contemporary framework of ecological validity eventually falls short of providing researchers with a tractable research program. Researchers seem to primarily base their judgments of ecological validity upon their own particular theoretical assumptions and considerations about the so-called artificiality–naturality and simplicity–complexity of experimental situations, typically in the absence of a more formal set of criteria. As such, while we certainly sympathize with the ‘call for ecological validity’, insofar it has motivated researchers to be critical about the limitations of experimental methods, we also think that the uncritical use of the term ecological validity has caused a lot of confusion, and in some cases has even been counterproductive. Perhaps the most problematic consequence of using the term ecological validity as an easy substitute for the ‘real-world’ was previously pointed out by Hammond (1998) . He commented that:
“There is, of course, no such thing as a “real-world.” It has been assigned no properties, and no definition; it is used simply because of the absence of a theory of tasks or other environments, and thus does not responsibly offer a frame of reference for the generalization” .
In Hammond’s view, the aim to understand cognitive and behavioral functioning in the ‘real-world’ is basically pointless if one does not first define this notion of the ‘real-world.’ As such, researchers have locked themselves “onto the horns of a false dilemma” ( Hammond and Stewart, 2001 , p. 7). Thus, in order to talk sensibly about whether some results can also be generalized to particular situations beyond the experimental conditions in which those results were obtained, researchers first need to specify the range and distributions of the variables and conditions to which their results are supposed to be applicable. Since the notion of the ‘real-world’ patently lacks specificity, this phrase inevitably hampers researchers to specify the range and boundary conditions of cognitive and behavioral functioning in any given research context, and thus precludes one from getting at the context-specific and context-generic principles of cognition and behavior (see also Kruglanski, 1975 ; Simons et al., 2017 ).
The Nature of the Environment?
Instead of trying to understand cognitive and behavioral functioning in the ‘real-world’, we completely agree with Hammond (1998) that the charge of researchers is to always specify and describe the particular context of behavior in which one is interested. Ultimately, the real challenge for researchers is to develop a theory of how specific environmental contexts are related to various forms of cognitive and behavioral functioning. But what constitutes a psychologist’s theory of the environment? Researchers in psychological science are typically concerned with the nature of the organism, yet, the nature of the environment and its relation to cognitive and behavioral functioning has received considerably less attention from a theoretical point of view ( Barker, 1966 ; Heft, 2013 ). Interestingly, there have been several scholars who have dedicated themselves to precisely this question, and whose theories of cognition and behavior included a clear perspective on the nature of the environment.
According to Tolman and Brunswik (1935) , the nature of the environment, as it appears to the organism, is full of uncertainties. The organism perceives the environment as an array of proximal ‘cues’ and ‘signs’ (i.e., information sources), which are the ‘local representatives’ of various distal objects and events in the organism’s environment. To function more or less efficiently, the organism needs to accumulate, combine, and substitute the information it derives from the available ‘cues’ and ‘signs,’ so that it can adequately adjust its means to achieve its behavioral goals (e.g., finding food or shelter). However, since the environment is inherently probabilistic and only partly predictable, the organism continually needs to adjust its assumptions about the state of the environment based on the available information sources. Another example is given by Barker (1968) , whose concept of ‘behavior settings’ (see also Heft, 2001 ) is key in describing how the environment shapes the frequency and occurrence of human cognition and behavior. Important to behavior settings is that they are the product of the collective actions of a group of individuals. Their geographical location can be specified (e.g., the supermarket, the cinema, etc.), and they have clear temporal and physical boundaries (e.g., opening hours, a door to enter and exit the building). Behavior settings are ‘independent’ of an individual’s subjective experience, yet what goes on inside any behavior setting is characterized by a high degree of interdependency and equivalence of actions between individuals (e.g., most people who are inside a supermarket are shopping for groceries and people in cinemas are watching movies). Another ‘classic’ example of a theory of the environment can be found in J. J. Gibson’s book The Ecological Approach to Visual Perception (1979/2014). According to Gibson, there exists a strong mutuality and reciprocity between the organism and its environment. He introduced the concept of ‘affordances’ to explain how the inherent ‘meaning’ of things (i.e., functional significance to the individual) can be directly perceived by an individual perceiver and how this ‘information’ shapes the possibilities for potential actions and experiences. For example, a sufficiently firm and smooth surface may be walk-on-able, run-on-able, or dance-on-able, whereas a rough surface cluttered with obstacles does not afford such actions ( Heft, 2001 ). In short, affordances are properties of an organism-environment system. They are perceiver-relative functional qualities of an object, event or place in the environment and they are dependent on the particular features of the environment and their relationships with the functional capabilities of a particular individual (for more in-depth discussions, see e.g., Heft, 2001 ; Stoffregen, 2003 ).
In order to describe and specify the environment and its relation to cognitive and behavioral functioning, we may draw on these scholars to guide us in a more specific direction. While we do not specifically recommend any of these perspectives, we think they are illuminating because these scholars motivate us to ask questions such as: What is the specific functional context of the cognitive and behavioral processes one is interested in? What are the relevant variables and conditions in this context given one’s focus of inquiry and level of analysis? What do we know or assume to know about the range and distribution of these variables and conditions? And how can these variables and conditions be represented in experimental designs to study specific patterns of cognitive and behavioral functioning? In order to answer some these questions, several researchers have emphasized the importance of first observing how people behave in everyday situations prior to experimentation. For example, Kingstone et al. (2008) advocated for an approach called Cognitive Ethology , which proposes that researchers should first observe how people behave in everyday situations before moving into the laboratory. In a similar vein, Adolph (2019) proposes that researchers should start with a rich description of the behaviors they are interested in order to first identify the “essential invariants” of these behaviors (p. 187).
The Field of Social Attention: Away From the Real-World and Toward Specificity About Context
To exemplify how some of the ideas outlined above may be useful to researchers, we will apply these ideas to a research topic of our interest: social attention. The field of social attention, as briefly discussed previously, is primarily focused on how attention is influenced by socially relevant objects, events, and situations, most notably, interactions with other social agents. In recent decades, it has been argued extensively that the experimental arrangements used by researchers in this field need more ‘ecological validity’ in order to adequately study the relevant characteristics of social attention in the ‘real-world’ ( Risko et al., 2012 , 2016 ; Schilbach et al., 2013 ; Caruana et al., 2017 ; Macdonald and Tatler, 2018 ; Shamay-Tsoory and Mendelsohn, 2019 ). In the light of these concerns, several researchers have advocated to study “real-world social attention” ( Risko et al., 2016 , p. 1) and “real-world social interaction” ( Macdonald and Tatler, 2018 , p. 1; see also Shamay-Tsoory and Mendelsohn, 2019 ). One example of this is given by Macdonald and Tatler (2018) . In this study, Macdonald and Tatler (2018) investigated how social roles given to participants influenced their social gaze behavior during a collaborative task: baking a cake together. Participants were either not given explicit social roles, or they were given a ‘Chef’ or ‘Gatherer’ role. Macdonald and Tatler (2018) showed that, regardless of whether social roles were assigned or not, participants did not gaze at their cake-baking partners very often while carrying out the task. After comparing their results with other so-called ‘real-world interaction studies’ (e.g., Laidlaw et al., 2011 ; Wu et al., 2013 ), the authors stated that: “we are not able to generalize about the specific amount of partner gaze during any given real-world interaction” ( Macdonald and Tatler, 2018 , p. 2171). We think that this statement clearly illustrates how the use of ‘real-world’ and ‘real life’ labels may lead to misleading and potentially counterproductive conclusions, as it seems to imply that ‘real-world interactions’ encompass a clearly defined category of behaviors. However, as argued previously, these so-called ‘real-world interactions’ are not a clearly defined category of behaviors. Instead, statements about generalizability need to be considered within a more constrained and carefully defined context (cf. Brunswik, 1956 ; Simons et al., 2017 ). This would make it more clear what researchers are talking about instead of subsuming studies under the big umbrella of the ‘real-world.’ For example, if the goal is to study how the cognitive and behavioral processes of social attention are influenced by different contexts and situations, researchers need to specify social gaze behavior as a function of these different contexts and situations.
Thus, instead of studying ‘real-world’ social attention in the context of ‘real-world’ social interactions, researchers should first try to describe and understand cake-baking attention ( Macdonald and Tatler, 2018 ), sharing-a-meal attention ( Wu et al., 2013 ), waiting-room attention ( Laidlaw et al., 2011 ), walking-on-campus attention ( Foulsham et al., 2011 ), Lego-block-building attention ( Macdonald and Tatler, 2013 ), playing-word-games attention ( Ho et al., 2015 ), interviewee-attention ( Freeth et al., 2013 ), and garage-sale attention ( Rubo and Gamer, 2018 ). By doing so, we may begin to understand the context-generic and context-specific aspects of attentional processes, allowing for a more sophisticated theory of social attention. These examples not only show the wide variety of behavioral tasks and contexts that are possible to study in relation to social attention, they also show that uncritical references to ‘ecological validity’ a.k.a. ‘real-worldliness’ are not very helpful to specify the relevant characteristics of particular behavioral contexts.
There are also good examples where researchers have been more explicit about the specific characteristics of social situations that they are interested in. Researchers in the field of social attention have, for example, tried to unravel the different functions of gaze behavior. One important function of gaze behavior is to acquire visual information from the world, however, within a social context, gaze may also signal important information to others which may be used to initiate and facilitate social interaction (see e.g., Gobel et al., 2015 ; Risko et al., 2016 ). In a series of experiments, researchers have systematically varied whether, and the degree to which social interaction between two people was possible, and measured how gaze was modulated as a function of the social context ( Laidlaw et al., 2011 ; Gobel et al., 2015 ; Gregory and Antolin, 2019 ; Holleman et al., 2020 ). In other studies, researchers have been explicit about the task-demands and social contexts that elicit specific patterns of gaze behavior, for example, in the context of face-to-face interactions and conversational exchanges ( Ho et al., 2015 ; Hessels et al., 2019 ). We think that, if researchers would try to be more explicit in their descriptions of task-demands and social contexts in relation to gaze, this may prove to be a solid basis for a more sophisticated theory of social attention, yet such work remains challenging (for a recent review, see Hessels, in press ).
We have argued that the ‘real-world approach’ and its call for ecological validity has several problems. The concept of ecological validity itself is seldom defined and interpretations differ among researchers. We believe that references to ecological validity and the ‘real-world’ can become superfluous if researchers would clearly specify and describe the particular contexts of behavior in which they are interested. This will be a more constructive way to uncover the context-specific and context-generic principles of cognition and behavior. As a final note, we hope that editors and reviewers will safeguard journals from publishing papers where terms such as ‘ecological validity’ and the ‘real-world’ are used without specification.
Author Contributions
GH and RH drafted the manuscript. RH, IH, and CK edited and revised the manuscript.
Conflict of Interest
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Funding. GH and RH were supported by the Consortium on Individual Development (CID). CID is funded through the Gravitation programme of the Dutch Ministry of Education, Culture, and Science and the NWO (grant no. 024.001.003 awarded to author CK).
- Aanstoos C. M. (1991). Experimental psychology and the challenge of real life. Am. Psychol. 1:77 10.1037/0003-066x.46.1.77 [ DOI ] [ Google Scholar ]
- Adami C. (2002). What is complexity? Bioessays 24 1085–1094. [ DOI ] [ PubMed ] [ Google Scholar ]
- Adolph K. E. (2019). “Ecological validity: mistaking the lab for real life,” in My Biggest Research Mistake: Adventures and Misadventures in Psychological Research , Ed. Sternberg R. (New York, NY: Sage; ), 187–190. 10.4135/9781071802601.n58 [ DOI ] [ Google Scholar ]
- Alderman N., Burgess P. W., Knight C., Henman C. (2003). Ecological validity of a simplified version of the multiple errands shopping test. J. Int. Neuropsychol. Soc. 9 31–44. 10.1017/s1355617703910046 [ DOI ] [ PubMed ] [ Google Scholar ]
- Anderson C. A., Lindsay J. J., Bushman B. J. (1999). Research in the psychological laboratory: truth or triviality? Curr. Direct. Psychol. Sci. 8 3–9. 10.1111/1467-8721.00002 [ DOI ] [ Google Scholar ]
- Araujo D., Davids K., Passos P. (2007). Ecological validity, representative design, and correspondence between experimental task constraints and behavioral setting: comment on Rogers. Kadar, and Costall (2005). Ecol. Psychol. 19 69–78. 10.1080/10407410709336951 [ DOI ] [ Google Scholar ]
- Ashcraft M., Radvansky G. (2009). Cognition , 5th Edn Upper Saddle River, NJ: Pearson Education, Inc. [ Google Scholar ]
- Aspland H., Gardner F. (2003). Observational measures of parent-child interaction: an introductory review. Child Adolesc. Mental Health 8 136–143. 10.1111/1475-3588.00061 [ DOI ] [ PubMed ] [ Google Scholar ]
- Banaji M. R., Crowder R. G. (1989). The bankruptcy of everyday memory. Am. Psychol. 44:1185 10.1037/0003-066x.44.9.1185 [ DOI ] [ Google Scholar ]
- Barker R. G. (1966). “On the nature of the environment,” in The Psychology of Egon Brunswik , ed. Hammond K. R. (New York: Holt, Rinehart and Winston; ). [ Google Scholar ]
- Barker R. G. (1968). Ecological Psychology: Concepts and Methods for Studying the Environment of Human Behavior. Stanford, CA: Stanford University Press. [ Google Scholar ]
- Berkowitz L., Donnerstein E. (1982). External validity is more than skin deep: Some answers to criticisms of laboratory experiments. Am. Psychol. 37:245 10.1037/0003-066x.37.3.245 [ DOI ] [ Google Scholar ]
- Birmingham E., Ristic J., Kingstone A. (2012). “Investigating social attention: a case for increasing stimulus complexity in the laboratory,” in Cognitive Neuroscience, Development, and Psychopathology: Typical and Atypical Developmental Trajectories of Attention , eds Burack J. A., Enns J. T., Fox N. A. (Oxford University Press: ), 251–276. 10.1093/acprof:oso/9780195315455.003.0010 [ DOI ] [ Google Scholar ]
- Blanco-Elorrieta E., Pylkkänen L. (2018). Ecological validity in bilingualism research and the bilingual advantage. Trends Cogn. Sci. 22 1117–1126. 10.1016/j.tics.2018.10.001 [ DOI ] [ PubMed ] [ Google Scholar ]
- Bronfenbrenner U. (1977). Toward an experimental ecology of human development. Am. Psychol. 32:513 10.1037/0003-066x.32.7.513 [ DOI ] [ Google Scholar ]
- Brunswik E. (1943). Organismic achievement and environmental probability. Psychol. Rev. 50:255 10.1037/h0060889 [ DOI ] [ Google Scholar ]
- Brunswik E. (1949). Remarks on functionalism in perception. J. Pers. 18 56–65. 10.1111/j.1467-6494.1949.tb01233.x [ DOI ] [ Google Scholar ]
- Brunswik E. (1952). The Conceptual Framework of Psychology. Chicago: University of Chicago Press. [ Google Scholar ]
- Brunswik E. (1955). Representative design and probabilistic theory in a functional psychology. Psychol. Rev. 62:193. 10.1037/h0047470 [ DOI ] [ PubMed ] [ Google Scholar ]
- Brunswik E. (1956). Perception and the Representative Design of Psychological Experiments. Berkeley: University of California Press. [ Google Scholar ]
- Campbell D. T. (1957). Factors relevant to the validity of experiments in social settings. Psychol. Bull. 54:297. 10.1037/h0040950 [ DOI ] [ PubMed ] [ Google Scholar ]
- Caro T., Stankowich T. (2015). Concordance on zebra stripes: a comment on Larison et al.(2015). R. Soc. Open Sci. 2:150323. 10.1098/rsos.150323 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Caruana N., McArthur G., Woolgar A., Brock J. (2017). Simulating social interactions for the experimental investigation of joint attention. Neurosci. Biobehav. Rev. 74 115–125. 10.1016/j.neubiorev.2016.12.022 [ DOI ] [ PubMed ] [ Google Scholar ]
- Cohen G., Conway M. A. (2007). Memory in the Real World. Abingdon: Psychology Press. [ Google Scholar ]
- Danziger K. (1994). Constructing the Subject: Historical Origins of Psychological Research. Cambridge: Cambridge University Press. [ Google Scholar ]
- Davids K. (1988). Ecological validity in understanding sport performance: some problems of definition. Quest 40 126–136. 10.1080/00336297.1988.10483894 [ DOI ] [ Google Scholar ]
- Dhami M. K., Hertwig R., Hoffrage U. (2004). The role of representative design in an ecological approach to cognition. Psychol. Bull. 130:959. 10.1037/0033-2909.130.6.959 [ DOI ] [ PubMed ] [ Google Scholar ]
- Dunlosky J., Bottiroli S., Hartwig M. (2009). “Sins committed in the name of ecological validity: A call for representative design in education science,” in Handbook of Metacognition in Education , eds Hacker D. J., Dunlosky J., Graesser A. C. (Abingdon: Routledge; ), 442–452. [ Google Scholar ]
- Eaton W. O., Clore G. L. (1975). Interracial imitation at a summer camp. J. Pers. Soc. Psychol. 32:1099 10.1037/0022-3514.32.6.1099 [ DOI ] [ Google Scholar ]
- Edmonds B. (1995). “What is complexity?-the philosophy of complexity per se with application to some examples in evolution,” in The Evolution of Complexity , Ed. Bonner J. T. (Dordrecht: Kluwer; ). [ Google Scholar ]
- Foulsham T., Walker E., Kingstone A. (2011). The where, what and when of gaze allocation in the lab and the natural environment. Vis. Res. 51 1920–1931. 10.1016/j.visres.2011.07.002 [ DOI ] [ PubMed ] [ Google Scholar ]
- Fredericksen R., Bex P. J., Verstraten F. A. (1997). How big is a Gabor patch, and why should we care? JOSA A 14 1–12. [ DOI ] [ PubMed ] [ Google Scholar ]
- Freeth M., Foulsham T., Kingstone A. (2013). What affects social attention? Social presence, eye contact and autistic traits. PLoS One 8:e53286. 10.1371/journal.pone.0053286 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Frischen A., Bayliss A. P., Tipper S. P. (2007). Gaze cueing of attention: visual attention, social cognition, and individual differences. Psychol. Bull. 133:694. 10.1037/0033-2909.133.4.694 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Gell-Mann M. (1995). What is complexity? Remarks on simplicity and complexity by the Nobel Prize-winning author of The Quark and the Jaguar. Complexity 1 16–19. 10.1002/cplx.6130010105 [ DOI ] [ Google Scholar ]
- Gibson J. J. (1950). The Perception of the Visual World. Cambridge: Houghton Mifflin Company. [ Google Scholar ]
- Gibson J. J. (1970). On the relation between hallucination and perception. Leonardo 3 425–427. [ Google Scholar ]
- Gibson J. J. (2014). The Ecological Approach to Visual Perception: Classic Edition. New York, NY: Psychology Press. (Original date of publication 1979). [ Google Scholar ]
- Gillis J., Schneider C. (1966). “The historical preconditions of representative design,” in The Psychology of Egon Brunswik , ed. Hammond K. R. (New York, NY: Holt, Rinehart & Winston, Inc; ), 204–236. [ Google Scholar ]
- Gobel M. S., Kim H. S., Richardson D. C. (2015). The dual function of social gaze. Cognition 136 359–364. 10.1016/j.cognition.2014.11.040 [ DOI ] [ PubMed ] [ Google Scholar ]
- Goodwin C. J. (2015). A History of Modern Psychology , 5 Edn Hoboken, NJ: John Wiley & Sons. [ Google Scholar ]
- Greenwald A. G. (1976). Within-subjects designs: to use or not to use? Psychol. Bull. 83:314 10.1037/0033-2909.83.2.314 [ DOI ] [ Google Scholar ]
- Gregory N. J., Antolin J. V. (2019). Does social presence or the potential for interaction reduce social gaze in online social scenarios? Introducing the “live lab” paradigm. Q. J. Exp. Psychol. 72 779–791. 10.1177/1747021818772812 [ DOI ] [ PubMed ] [ Google Scholar ]
- Hammond K. R. (1998). Ecological Validity: Then and Now. Available online at: http://www.brunswik.org/notes/essay2.html (accessed April 1, 2020). [ Google Scholar ]
- Hammond K. R., Stewart T. R. (2001). The Essential Brunswik: Beginnings, Explications, Applications. New York, NY: Oxford University Press. [ Google Scholar ]
- Hatfield G. (2002). Psychology, philosophy, and cognitive science: reflections on the history and philosophy of experimental psychology. Mind Lang. 17 207–232. 10.1111/1468-0017.00196 [ DOI ] [ Google Scholar ]
- Heft H. (2001). Ecological Psychology in Context: James Gibson, Roger Barker, and the Legacy of William James’s Radical Empiricism. Hove: Psychology Press. [ Google Scholar ]
- Heft H. (2013). An ecological approach to psychology. Rev. Gen. Psychol. 17 162–167. 10.1037/a0032928 [ DOI ] [ Google Scholar ]
- Heinrichs R. W. (1990). Current and emergent applications of neuropsychological assessment: problems of validity and utility. Prof. Psychol. 21:171 10.1037/0735-7028.21.3.171 [ DOI ] [ Google Scholar ]
- Hessels R. S. (in press). How does gaze to faces support face-to-face interaction? A review and perspective. Psychonom. Bull. Rev. 10.31219/osf.io/8zta5 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Hessels R. S., Holleman G. A., Kingstone A., Hooge I. T. C., Kemner C. (2019). Gaze allocation in face-to-face communication is affected primarily by task structure and social context, not stimulus-driven factors. Cognition 184 28–43. 10.1016/j.cognition.2018.12.005 [ DOI ] [ PubMed ] [ Google Scholar ]
- Hessels R. S., van Doorn A. J., Benjamins J. S., Holleman G. A., Hooge I. T. C. (2020). Task-related gaze control in human crowd navigation. Attent. Percept. Psychophys. 10.3758/s13414-019-01952-9 [Online ahead of print] [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Higginson C. I., Arnett P. A., Voss W. D. (2000). The ecological validity of clinical tests of memory and attention in multiple sclerosis. Arch. Clin. Neuropsychol. 15 185–204. 10.1016/s0887-6177(99)00004-9 [ DOI ] [ PubMed ] [ Google Scholar ]
- Hilton W. (1920). Applied Psychology: Driving Power of Thought. The Society of Applied Psychology Available online at: http://www.gutenberg.org/files/33076/33076-h/33076-h.htm (accessed April 1, 2020). [ Google Scholar ]
- Ho S., Foulsham T., Kingstone A. (2015). Speaking and listening with the eyes: gaze signaling during dyadic interactions. PLoS One 10:e0136905. 10.1371/journal.pone.0136905 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Hoc J.-M. (2001). Towards ecological validity of research in cognitive ergonomics. Theor. Issues Ergon. Sci. 2 278–288. 10.1371/journal.pone.0184488 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Hogarth R. M. (2005). The challenge of representative design in psychology and economics. J. Econ. Methodol. 12 253–263. 10.1177/0269216311399663 [ DOI ] [ PubMed ] [ Google Scholar ]
- Holleman G. A., Hessels R. S., Kemner C., Hooge I. T. C. (2020). Implying social interaction and its influence on gaze behavior to the eyes. PLoS One 15:e0229203. 10.1371/journal.pone.0229203 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Holleman G. A., Hooge I. T. C., Kemner C., Hessels R. S. (in press). The reality of ‘real-life’ neuroscience: a commentary on Shamay-Tsoory & Mendelsohn. Perspect. Psychol. Sci. [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Jack R. E., Schyns P. G. (2017). Toward a social psychophysics of face communication. Annu. Rev. Psychol. 68 269–297. 10.1146/annurev-psych-010416-044242 [ DOI ] [ PubMed ] [ Google Scholar ]
- Jenkins J. J. (1974). Remember that old theory of memory? Well, forget it. Am. Psychol. 29:785 10.1037/h0037399 [ DOI ] [ Google Scholar ]
- Johnston P., Molyneux R., Young A. W. (2014). The N170 observed ‘in the wild’: robust event-related potentials to faces in cluttered dynamic visual scenes. Soc. Cogn. Affect. Neurosci. 10 938–944. 10.1093/scan/nsu136 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Kingstone A., Smilek D., Eastwood J. D. (2008). Cognitive ethology: a new approach for studying human cognition. Br. J. Psychol. 99 317–340. 10.1348/000712607x251243 [ DOI ] [ PubMed ] [ Google Scholar ]
- Koehler J. J. (1996). The base rate fallacy reconsidered: descriptive, normative, and methodological challenges. Behav. Brain Sci. 19 1–17. 10.1017/s0140525x00041157 [ DOI ] [ Google Scholar ]
- Krakauer J. W., Ghazanfar A. A., Gomez-Marin A., MacIver M. A., Poeppel D. (2017). Neuroscience needs behavior: correcting a reductionist bias. Neuron 93 480–490. 10.1016/j.neuron.2016.12.041 [ DOI ] [ PubMed ] [ Google Scholar ]
- Kruglanski A. W. (1975). The two meanings of external invalidity. Hum. Relat. 28 653–659. 10.1177/001872677502800704 [ DOI ] [ Google Scholar ]
- Laidlaw K. E., Foulsham T., Kuhn G., Kingstone A. (2011). Potential social interactions are important to social attention. Proc. Natl. Acad. Sci. U.S.A. 108 5548–5553. 10.1073/pnas.1017022108 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Langton S. R., Watt R. J., Bruce V. (2000). Do the eyes have it? Cues to the direction of social attention. Trends Cogn. Sci. 4 50–59. 10.1016/s1364-6613(99)01436-9 [ DOI ] [ PubMed ] [ Google Scholar ]
- Lappi O. (2015). Eye tracking in the wild: the good, the bad and the ugly. J. Eye Mov. Res. 8:1. 10.1016/j.dcn.2019.100710 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Larison B., Harrigan R. J., Thomassen H. A., Rubenstein D. I., Chan-Golston A. M., Li E., et al. (2015). How the zebra got its stripes: a problem with too many solutions. R. Soc. Open Science 2:140452. 10.1098/rsos.140452 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Lewkowicz D. J. (2001). The concept of ecological validity: what are its limitations and is it bad to be invalid? Infancy 2 437–450. 10.1207/s15327078in0204_03 [ DOI ] [ PubMed ] [ Google Scholar ]
- Macdonald R. G., Tatler B. W. (2013). Do as eye say: gaze cueing and language in a real-world social interaction. J. Vis. 13 1–12. 10.1167/13.4.6 [ DOI ] [ PubMed ] [ Google Scholar ]
- Macdonald R. G., Tatler B. W. (2018). Gaze in a real-world social interaction: a dual eye-tracking study. Q. J. Exp. Psychol. 71 2162–2173. 10.1177/1747021817739221 [ DOI ] [ PubMed ] [ Google Scholar ]
- Mills W. (1899). The nature of animal intelligence and the methods of investigating it. Psychol. Rev. 6:262 10.1037/h0074808 [ DOI ] [ Google Scholar ]
- Mook D. G. (1983). In defense of external invalidity. Am. Psychol. 38:379 10.1037/0003-066x.38.4.379 [ DOI ] [ Google Scholar ]
- Neisser U. (1976). Cognition and Reality: Principles and Implications Of Cognitive Psychology. San Fransisco, CA: W. H. Freeman and Company. [ Google Scholar ]
- Neisser U. (1991). A case of misplaced nostalgia. Am. Psychol. 46:34–36. 10.1037/0003-066x.46.1.34 [ DOI ] [ Google Scholar ]
- Osborne-Crowley K. (2020). Social cognition in the real world: reconnecting the study of social cognition with social reality. Rev. Gen. Psychol. 1–15. 10.4324/9781315648156-1 [ DOI ] [ Google Scholar ]
- Parsons T. D. (2015). Virtual reality for enhanced ecological validity and experimental control in the clinical, affective and social neurosciences. Front. Hum. Neurosci. 9:660. 10.3389/fnhum.2015.00660 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Parsons T. D., Carlew A. R., Magtoto J., Stonecipher K. (2017). The potential of function-led virtual environments for ecologically valid measures of executive function in experimental and clinical neuropsychology. Neuropsychol. Rehabil. 27 777–807. 10.1080/09602011.2015.1109524 [ DOI ] [ PubMed ] [ Google Scholar ]
- Peelen M. V., Kastner S. (2014). Attention in the real world: toward understanding its neural basis. Trends Cogn. Sci. 18 242–250. 10.1016/j.tics.2014.02.004 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Puce A., Bertenthal B. I. (2015). The Many Faces of Social Attention: Behavioral and Neural Measures , eds Puce A., Bertenthal B. I. (Switzerland: Springer; ). [ Google Scholar ]
- Richter J. N., Hochner B., Kuba M. J. (2016). Pull or push? Octopuses solve a puzzle problem. PLoS One 11:e0152048. 10.1371/journal.pone.0152048 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Risko E. F., Laidlaw K., Freeth M., Foulsham T., Kingstone A. (2012). Social attention with real versus reel stimuli: toward an empirical approach to concerns about ecological validity. Front. Hum. Neurosci. 6:143. 10.3389/fnhum.2012.00143 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Risko E. F., Richardson D. C., Kingstone A. (2016). Breaking the fourth wall of cognitive science: real-world social attention and the dual function of gaze. Curr. Direct. Psychol. Sci. 25 70–74. 10.1177/0963721415617806 [ DOI ] [ Google Scholar ]
- Rogers S. D., Kadar E. E., Costall A. (2005). Gaze patterns in the visual control of straight-road driving and braking as a function of speed and expertise. Ecol. Psychol. 17 19–38. 10.1207/s15326969eco1701_2 [ DOI ] [ Google Scholar ]
- Rubo M., Gamer M. (2018). “Virtual reality as a proxy for real-life social attention?,” Paper presented at the Proceedings of the 2018 ACM Symposium on Eye Tracking Research & Applications. New York, NY. [ Google Scholar ]
- Schilbach L. (2015). Eye to eye, face to face and brain to brain: novel approaches to study the behavioral dynamics and neural mechanisms of social interactions. Curr. Opin. Behav. Sci. 3 130–135. 10.1016/j.cobeha.2015.03.006 [ DOI ] [ Google Scholar ]
- Schilbach L., Timmermans B., Reddy V., Costall A., Bente G., Schlicht T., et al. (2013). Toward a second-person neuroscience. Behav. Brain Sci. 36 393–414. 10.1017/s0140525x12000660 [ DOI ] [ PubMed ] [ Google Scholar ]
- Schmuckler M. A. (2001). What is ecological validity? A dimensional analysis. Infancy 2 419–436. 10.1207/s15327078in0204_02 [ DOI ] [ PubMed ] [ Google Scholar ]
- Shallice T., Burgess P. W. (1991). Deficits in strategy application following frontal lobe damage in man. Brain 114 727–741. 10.1093/brain/114.2.727 [ DOI ] [ PubMed ] [ Google Scholar ]
- Shamay-Tsoory S. G., Mendelsohn A. (2019). Real-life neuroscience: an ecological approach to brain and behavior research. Perspect. Psychol. Sci. 14 841–859. 10.1177/1745691619856350 [ DOI ] [ PubMed ] [ Google Scholar ]
- Silverstein C. H., Stang D. J. (1976). Seating position and interaction in triads: a field study. Sociometry 39 166–170. [ Google Scholar ]
- Simons D. J., Levin D. T. (1998). Failure to detect changes to people during a real-world interaction. Psychonom. Bull. Rev. 5 644–649. 10.3758/bf03208840 [ DOI ] [ Google Scholar ]
- Simons D. J., Shoda Y., Lindsay D. S. (2017). Constraints on generality (COG): a proposed addition to all empirical papers. Perspect. Psychol. Sci. 12 1123–1128. 10.1177/1745691617708630 [ DOI ] [ PubMed ] [ Google Scholar ]
- Smilek D., Birmingham E., Cameron D., Bischof W., Kingstone A. (2006). Cognitive ethology and exploring attention in real-world scenes. Brain Res. 1080 101–119. 10.1016/j.brainres.2005.12.090 [ DOI ] [ PubMed ] [ Google Scholar ]
- Smith P. W., Feinberg R. A., Burns D. J. (1998). An examination of classical conditioning principles in an ecologically valid advertising context. J. Market. Theor. Pract. 6 63–72. 10.1080/10696679.1998.11501789 [ DOI ] [ Google Scholar ]
- Sonkusare S., Breakspear M., Guo C. (2019). Naturalistic stimuli in neuroscience: critically acclaimed. Trends Cogn. Sci. 23 699–714. 10.1016/j.tics.2019.05.004 [ DOI ] [ PubMed ] [ Google Scholar ]
- Stoffregen T. A. (2003). Affordances as properties of the animal-environment system. Ecol. Psychol. 15 115–134. 10.1016/j.humov.2019.01.002 [ DOI ] [ PubMed ] [ Google Scholar ]
- Thorndike E. (1899). A reply to “The nature of animal intelligence and the methods of investigating it”. Psychol. Rev. 6 412–420. 10.1037/h0073289 [ DOI ] [ Google Scholar ]
- Thorndike E. (2017). Animal Intelligence: Experimental Studies. Abingdon: Routledge. [ Google Scholar ]
- Todorović A. (2016) What’s in a Gabor Patch? Vol. 2016 Available online at: http://neuroanatody.com/2016/05/whats-in-a-gabor-patch/ (accessed April 1, 2020). [ Google Scholar ]
- Tolman E. C., Brunswik E. (1935). The organism and the causal texture of the environment. Psychol. Rev. 42:43 10.1037/h0062156 [ DOI ] [ Google Scholar ]
- Valtakari N. V., Hooge I. T. C., Benjamins J. S., Keizer A. (2019). An eye-tracking approach to Autonomous sensory meridian response (ASMR): the physiology and nature of tingles in relation to the pupil. PLoS One 14:e226692. 10.1371/journal.pone.0226692 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- Wilson B. A. (1993). Ecological validity of neuropsychological assessment: do neuropsychological indexes predict performance in everyday activities? Appl. Prevent. Psychol. 2 209–215. 10.1016/s0962-1849(05)80091-5 [ DOI ] [ Google Scholar ]
- Winograd E. (1988). “Continuities between ecological and laboratory approaches to memory,” in Emory Symposia in Cognition, 2. Remembering Reconsidered: Ecological and Traditional Approaches to the Study of Memory eds Neisser U., Winograd E. (Cambridge: Cambridge University Press; ), 11–20. 10.1017/cbo9780511664014.003 [ DOI ] [ Google Scholar ]
- Wu D. W.-L., Bischof W. F., Kingstone A. (2013). Looking while eating: the importance of social context to social attention. Sci. Rep. 3:2356. 10.1038/srep02356 [ DOI ] [ PMC free article ] [ PubMed ] [ Google Scholar ]
- View on publisher site
- PDF (1.2 MB)
- Collections
Similar articles
Cited by other articles, links to ncbi databases.
- Download .nbib .nbib
- Format: AMA APA MLA NLM
IMAGES
VIDEO
COMMENTS
Study with Quizlet and memorize flashcards containing terms like Experimental research on behavior is often said to be artificial. To compensate for this problem, researchers do ____., The first person to use a single-subject reversal design was probably _____., The kind of study that is most likely to involve a large number of subjects is one with a _____. and more.
Study with Quizlet and memorize flashcards containing terms like Experimental research on behavior is often said to be artificial. To compensate for this problem, researchers do _____ ., In a cumulative record, learning is indicated by a change in response _____ ., Response _____ refers to the time that passes before a response occurs. and more.
10 Experimental research Experimental research—often considered to be the 'gold standard' in research designs—is one of the most rigorous of all research designs. ... but this comes at the cost of low external validity (generalisability), because the artificial ... a main effect is said to exist if the dependent variable shows a ...
At the same time, the way that experiments are conducted sometimes leads to a different kind of criticism. Specifically, the need to manipulate the independent variable and control extraneous variables means that experiments are often conducted under conditions that seem artificial (Bauman, McGraw, Bartels, & Warren, 2014) [3].
External Validity. At the same time, the way that experiments are conducted sometimes leads to a different kind of criticism. Specifically, the need to manipulate the independent variable and control extraneous variables means that experiments are often conducted under conditions that seem artificial or unlike "real life" (Stanovich, 2010).
The experimental method involves the manipulation of variables to establish cause-and-effect relationships. The key features are controlled methods and the random allocation of participants into controlled and experimental groups. What is an Experiment? An experiment is an investigation in which a hypothesis is scientifically tested. An ...
Somewhat ironically, understanding consumer behavior through experimental research does not always involve examining the actual behavior of consumers. More often, in fact, it involves manipulating aspects of a stylized, artificial scenario and measuring consumer responses to these scenarios, responses that typically reflect consumers ...
Contrary to these perceptions, the dominant type of research in psychology can be described as 'basic science': it occurs in the laboratory or other artificial settings primarily using ...
Experimental research on behavior is often said to be artificial. To compensate for this problem, researchers do _____ . A. field experiments B. open-ended research C. follow-up studies D. free sampling research. A. field experiments. Any variable that is allowed to vary freely is a(n) _____ variable.
In contrast, researchers have typically conducted their research in experimental research settings, a.k.a. the 'psychologist's laboratory.' Critics have often questioned whether psychology's laboratory experiments permit generalizable results. This is known as the 'real-world or the lab'-dilemma.