ASCB logo LSE Logo

Optional Exam Retakes Reduce Anxiety but may Exacerbate Score Disparities Between Students with Different Social Identities

    Published Online:https://doi.org/10.1187/cbe.21-11-0320

    Abstract

    Use of high-stakes exams in a course has been associated with gender, racial, and socioeconomic inequities. We investigated whether offering students the opportunity to retake an exam makes high-stakes exams more equitable. Following the control value theory of achievement emotions, we hypothesized that exam retakes would increase students’ perceived control over their performance and decrease the value of a single exam attempt, thereby maximizing exam performance. We collected data on exam scores and experiences with retakes from three large introductory biology courses and assessed the effect of optional exam retakes on gender, racial/ethnic, and socioeconomic disparities in exam scores. We found that Black/African American students and those who worked more than 20 h a week were less likely to retake exams. While exam retakes significantly improved student scores, they slightly increased racial/ethnic and socioeconomic disparities in scores partly because of these differences in participation rates. Most students reported that retake opportunities reduced their anxiety on the initial exam attempt. Together our results suggest that optional exam retakes could be a useful tool to improve student performance and reduce anxiety associated with high-stakes exams. However, barriers to participation must be examined and reduced for retakes to reduce disparities in scores.

    INTRODUCTION

    High attrition of students from undergraduate STEM (Science, Engineering, Technology, and Math) majors, especially among women, Black, Latinx, and Native American students, remains a major issue in STEM education in the United States (National Science Foundation and National Center for Science and Engineering Statistics, 2019; Seymour and Hunter, 2019; Asai, 2020). One major reason that students switch out of STEM majors is receiving a poor grade in an introductory STEM course (Seymour and Hewitt, 1997; Astorne-Figari and Speer, 2019; Weston et al., 2019). While many factors, including inequitable teaching practices, noninclusive learning environments, large class sizes, and students’ prior academic preparation, affect student grades in introductory courses (Seymour and Hewitt, 1997; Eagan et al., 2011; Eddy et al., 2014; Freeman et al., 2014; Odom et al., 2021), summative assessments in a course have been shown to play a critical role (Eddy and Hogan, 2014; Cotner and Ballen, 2017; Seymour and Hunter, 2019; Odom et al., 2021).

    High-stakes exams (e.g., midterms and finals) often constitute a large portion of student grades in introductory STEM courses (Franklin and Theall, 1992; Kost et al., 2009) and inequities have been documented in these high-stakes summative assessments (Eddy et al., 2014; Wright et al., 2016). Studies have reported larger gender disparities in high-stakes exam scores with men scoring higher than women compared with other forms of assessments, such as lab reports and presentations, where women tend to do better (Kost et al., 2009; Miyake et al., 2010; Ballen et al., 2017, 2018; Matz et al., 2017; Salehi et al., 2019). Similar differences may be present along other dimensions of social identity such as race/ethnicity and socioeconomic status (Richardson, 2015; Wright et al., 2016; Simmons and Heckler, 2020). A reduction in the contribution of high-stakes exams toward total grade (Cotner and Ballen, 2017) or using more frequent low-stakes assessments (Eddy and Hogan, 2014) have been shown to reduce disparities in student course grades by social identities.

    However, high-stakes exams, especially those with multiple-choice questions, are often much easier for instructors to administer to large numbers of students (Momsen et al., 2010; Wright et al., 2018). Moreover, some instructors argue that high-stakes exams offer practice for standardized tests such as the MCAT1 that students might take later on (Marsh et al., 2007). Given the tradition of high-stakes exams in college STEM courses, the relative ease of administering high-stakes exams with restricted responses, and the enduring presence of high-stakes exams even during major disruptive events such as COVID-19 (Clark et al., 2020; Gin et al., 2021; Supriya et al., 2021), widespread change in the use of these summative assessments is unlikely. Thus, an alternative strategy is to identify ways in which instructors can continue using high-stakes exams in their courses while still reducing disparities in exam scores by social identities. In this study, we examine whether optional exam retakes offer one possible solution.

    Theoretical Framework: Control Value Theory of Achievement Emotions

    Control value theory builds on and integrates several theories used to explain achievement emotions in academic settings including expectancy-value theory of achievement motivation (Eccles et al., 1983; Wigfield and Eccles, 2000), attributional theory of achievement emotions (Weiner, 1985), and perceived control theories (Perry, 1991; Patrick et al., 1993). According to the control value theory, two kinds of appraisals shape achievement emotions, which are feelings regarding activities or outcomes linked to student success: 1) Students’ perceived control over the activities and outcomes, for example, expectations that studying will lead to good exam performance, and 2) The value placed on achievement activities and outcomes by students, for example, the importance of good grades on an exam on future career (Pekrun, 2006). Perceived control includes both expectancies, for example, expected outcome of current efforts on an upcoming test (derived from the expectancy-value theory) and retrospective causal attributions, for example, the causes attributed to success or failure on a recent test (derived from the attributional theory of achievement emotions) (Pekrun, 2006). Following this framework, students feel anxiety regarding a test when they experience uncertainty about action-control and action-outcome expectancies, which are uncertainties regarding whether they can perform an action (also termed as “self-efficacy”) and whether their actions will result in a positive outcome or prevent a negative outcome (Bandura, 1977; Pekrun, 1988; Ringeisen et al., 2016; Roick and Ringeisen, 2017). In addition, students feel test anxiety when they value an exam, either due to the intrinsic value (importance of the content to them) or extrinsic value (the importance of their exam performance for other goals such as their career plans) (Pekrun, 2006). Those who have low value expectancies (e.g., they place low value on the exam) might experience negative emotions such as boredom irrespective of control expectancies (i.e., confidence that studying will result in a positive outcome) (Pekrun, 2006). On the other hand, students who have high value and control expectancies, meaning they value their performance on an exam and feel confident that studying will result in a positive outcome, will feel positive achievement emotions such as relief or hope (Pekrun, 2006).

    Control value theory also posits that activating positive achievement emotions such as pride and enjoyment caused by high control and value appraisals are associated with better performance on exams (Pekrun, 2006; Putwain et al., 2021). Additionally, negative deactivating achievement emotions such as boredom and helplessness are associated with worse exam performance. Negative activating emotions such as anxiety are theorized to have an ambivalent effect (Pekrun, 2006), but often can impact exam performance negatively (Hembree, 1988; Pekrun et al., 2011).

    The control and value appraisals that shape students’ achievement emotions are in turn influenced by a wide range of factors such as students’ personality traits, design of instructional environment, and the social environment experienced by students (Pekrun, 2006). One example of an instructional practice that would affect students’ control and value appraisals is performance goal structures such as norm-referenced grading (i.e., curving). Competitive performance goal structures could reduce students’ perceived control over their test score and increase negative achievement emotions compared with mastery goal structures (e.g., goals focused on achieving proficiency) that could enhance positive achievement emotions (Pekrun, 2006; Lau and Nie, 2008; Furner and Gonzalez-DeHass, 2011). One mechanism through which social environment can affect students’ control and value appraisals is the presence of stereotypes associated with lower academic achievement. For example, Bieg et al. (2015) found that higher generalized math anxiety over time among girls compared with boys is associated with higher math-related gender stereotype endorsement. Greater endorsement of such gender stereotypes about math (i.e., “boys are better at math”) were found to be associated with higher levels of math anxieties among women elementary teachers (Beilock et al., 2010). More broadly, stereotype threat, which is defined as “the risk of confirming, as self-characteristic, a negative stereotype about one’s group” (Steele and Aronson, 1995), could increase students’ value appraisal and decrease their control appraisal resulting in negative achievement emotions such as anxiety.

    Systematic differences in achievement emotions such as test anxiety between students with different social identities might be caused by the social environment and lead to differences in student performance. Several studies have reported demographic differences in average levels of test anxiety and strong negative correlations between self-esteem and self-efficacy and test anxiety (Hembree, 1988; von der Embse et al., 2018). These differences in achievement emotions in turn could lead to disparities in exam scores received by students with different social identities. A study of undergraduate students at a Canadian University found that 46.3% of women suffered from self-reported test anxiety at some point over the course of their university career compared with 30% of men (Gerwing et al., 2015). Other studies have shown that test anxiety has a negative effect on exam performance among women in introductory science courses (Ballen et al., 2017; Salehi et al., 2019). Higher test anxiety among women results in larger gender disparities in student exam scores on exams compared with nonexam assessments in introductory college science (Ballen et al., 2017; Salehi et al., 2019).

    Drawing upon the control value theory, we hypothesized that optional exam retakes might change students’ control and value appraisals and thereby reduce negative emotions such as anxiety, leading to smaller disparities in exam scores by social identities. Having the opportunity to retake an exam may help a student feel more control over their exam performance and may also reduce the value associated with their performance on the first attempt. It might also reduce anxiety and increase sense of control because of increased familiarity with the test format and question structure. Both of these would lead to greater likelihood of a student experiencing positive emotions such as relief or hope and decrease the likelihood of a student experiencing negative emotions such as anxiety or hopelessness. These effects on students’ control and value appraisals should in theory lead to better student exam scores. Moreover, if students with certain social identities have lower perceived control or place higher value over their performance due to factors such as stereotype threat, increasing students’ control appraisals and decreasing value appraisals might benefit such students disproportionately.

    Optional Exam Retakes Could Benefit Student Learning Through Testing Effect and Promoting Mastery Orientation

    In addition to increasing positive achievement emotions and reducing negative achievement emotions as described in the section above, optional exam retakes could benefit student learning through “testing effect” and promoting a mastery orientation among students. Testing effect is defined as the positive effect of taking a test on long-term retention, as indicated by better performance on a test at a later timepoint among students who were tested on given material soon after reading versus students who simply reread the material and were not tested soon after (Roediger and Butler, 2011). There is a large amount of evidence on test-enhanced learning in various contexts (Roediger and Karpicke, 2006; Roediger and Butler, 2011; Rowland, 2014; Adesope et al., 2017). There is also some evidence for test-enhanced learning from undergraduate biology classrooms (Dobson, 2008; Stanger-Hall et al., 2011; McDaniel et al., 2012; Orr and Foster, 2013). Some studies show that this effect is stronger when students receive feedback on their initial test before retesting, and when the initial test is “retrieval-based,” that is, requires students to generate the answer instead of simply recognizing the answer (Rawson and Dunlosky, 2012; Rowland, 2014; Brame and Biel, 2015). It is important to note, however, that most of the evidence for test-enhanced learning comes from studies where the initial test was associated with no or low stakes (Brame and Biel, 2015). Here, we examined whether the testing effect extends to scenarios where the initial test is high stakes by assessing the impact of retakes on student exam scores.

    Another benefit of offering optional exam retakes, especially when students are allowed to retake until they have demonstrated their learning, is that it can promote a mastery orientation. This can be helpful for long-term student learning, especially for courses where student understanding of one part of course content is essential to their ability to learn subsequent course content (Juhler et al., 1998). Prior studies where students were offered opportunities to retake exams have shown that students score better on retakes compared with first attempts (Fichten and Adler, 1977; Cates, 1982; Davidson et al., 1984; Friedman, 1987; Juhler et al., 1998; Abraham, 2000; Badawy et al., 2016; Herman et al., 2020) and students who participate in a greater number of retakes do better on a cumulative final (Friedman, 1987; Abraham, 2000; Walck-Shannon et al., 2019). Moreover, studies have reported that students liked the use of exam retakes and reported lower anxiety about the exams (Fichten and Adler, 1977; Cates, 1982; Davidson et al., 1984; Friedman, 1987; Abraham, 2000). This prompted our interest in examining student motivation to participate in exam retakes and their experience with taking exams.

    In this study, we examined the effect of optional exam retakes on student exam scores and asked whether optional exam retakes could reduce disparities in exam scores received by students with different social identities. Any advantages of an optional intervention such as this are contingent upon equitable participation. Therefore, we also assessed whether there were any demographic differences in student participation in optional exam retakes. Finally, we used surveys to understand student participation in and experiences with optional exam retakes, including whether having an opportunity to retake an exam helped students stay calmer on the first exam attempt.

    Research Questions and Predictions

    1. To what extent are there gender, socioeconomic, and racial/ethnic differences in participation in optional exam retakes? If there are differences, are these associated with differences in the reasons chosen for participating or not participating in exam retakes?

    2. What is the effect of retakes on student exam scores in the course?

    3. What is the effect of retakes on gender, socioeconomic, and racial/ethnic disparities in exam scores? To what extent is this associated with differences in participation in optional exam retakes?

    4. Are there demographic differences in student experiences with retakes, especially with respect to the effect of retakes on student anxiety while taking the initial exam?

    We predicted that likelihood of retaking an exam would be associated with students’ scores on the initial exam attempt as reported in prior studies (Friedman, 1987; Juhler et al., 1998), but whether a student would retake an exam may also be affected by factors that reduce the amount of time available to students, such as the number of hours they work a job during the semester. We expected to see a positive impact of retakes on student exam scores and a reduction in gender, socioeconomic and racial/ethnic disparities. Finally, we predicted that optional exam retakes would lower student anxiety while taking the initial exam and did not expect any demographic differences in student experiences with retakes.

    Researcher Positionality

    We recognize that researchers’ identities and positions shape the research they do and introduce various implicit biases. An important way to counter such biases is to be explicit about researcher identities and positions. Our research team consists of discipline-based education researchers (K.S., S.B., C.W., and K.V.) and instructors for the courses included in this study (C.W., J.E., D.T., C.P., and C.B.). Our team’s social identities include women (K.S, J.E., K.V., and S.B.), men (C.W., D.T., C.P., and C.B.), South Asian (K.S.), Hispanic (K.V.), white (S.B., C.W., J.E., D.T., C.P., and C.B.), first generation to college (D.T. and J.E.), and continuing generation (K.S., C.B., C.P., K.V., C.W., and S.B.). Moreover, some members of the team received Pell grants during their undergraduate degree and/or worked a job for at least 10 h/wk during college (C.W. and J.E.). Two members of the team (K.S. and C.B.) earned their undergraduate degree outside of the United States and are first-generation immigrants to the United States.

    MATERIALS AND METHODS

    This study was conducted at a large research-intensive public university in Southwestern United States over three courses that are part of the introductory biology sequence for biology majors between Spring 2019-Spring 2020. About one-third of undergraduate students at this university receive federal Pell grants, a little more than 20% are first-generation college students and about 80% are between the ages of 18–22. First-year students are expected to live on-campus. Students were offered an opportunity to retake exams in all three courses with some small differences as described below and summarized in Table 1. Each course had three major high-stakes timed exams, although exam contribution to the final grade differed across these courses. Questions on the retakes were isomorphic to the questions on the original exam. Students received their score on their first attempt and were given access to the questions with correct answers to review on their own and/or during a review session before the retake took place. Retakes happened within 2 wk from the original exam. Students were only eligible to participate in the retake exam if they took the original exam. For students who retook exams, only the highest score they received contributed toward their final grade, regardless of whether it was the first attempt or the retake.

    TABLE 1. Summary of differences among the three courses included in this study

    Course 1Course 2Course 3
    When was this course taught?Fall 2019Spring 2020Spring 2019
    Number of retake opportunities1 per exam1 per exam2 per exam
    When were retakes held?Outside of class timeIn class or onlineOutside of class time
    % course grade that exams contributed towards39%48%75%

    In all three courses, exams tended to focus on higher order Bloom’s questions (Crowe et al., 2008) and retakes contained isomorphic but not identical questions. Average student performance from previous iterations of this course showed that these questions tend to be challenging for students. Therefore, the impetus for implementing exam retakes was to help incentivize students to prepare more for exams while promoting higher order learning.

    Course 1: In this course, exams constituted 39% of the course grade. Students were offered one retake opportunity for each exam and the retakes were scheduled outside of official class time. While classes for this course were either on Mondays and Wednesdays or Tuesdays and Thursdays, the retakes were scheduled for Fridays. Students could pick from time slots throughout the day on Fridays to take the retake, but had to book the time slot in advance. This course was taught by C.W., J.E., D.T., and C.B.

    Course 2: In this course, exams constituted 48% of the course grade. Students were offered one retake opportunity for each exam and the retakes were scheduled during official class time. However, this course occurred during the Spring 2020 semester when classes transitioned to emergency remote learning due to the COVID-19 pandemic. Thus, only the first exam took place during the in-person period of the course and the other two exams were administered online. These exams were proctored using Respondus LockDown Browser and Monitoring; videos were reviewed by instructors and TAs following the exam, and students were given a 33-h window to complete exam 2 and exam 2 retake each, and a 72-h window for exam 3 and exam 3 retake each. This course was taught by C.W., J.E., and C.B.

    Course 3: In this course, exams constituted 75% of the course grade. Students were offered two retake opportunities for each exam and the retakes were scheduled outside of class. The first retake took place 2 wk after the first attempt for exams 1 and 2, and 1 wk after the first attempt for exam 3. The second retake was offered 1 wk after the first retake. Students received their score on the first retake before the second retake occurred. Retake scheduling and administration was outside of official class time similar to that for Course 1. See Table 1 for a summary of the differences among the courses. This course was taught by C.W. and C.P.

    Data Collection

    All protocols for this study were approved by ASU’s Institutional Review Board (IRB) protocol no. 10528 and no. 1634.

    We collected gradebooks from the introductory biology instructors who consented to participate in our study. For Courses 1 and 2, student consent for participation in the study was requested via an online survey administered using the Qualtrics platform. We filtered all course gradebooks to remove data from students who did not consent to participate in the study. For Course 3, we used archival data from an older iteration of the course so we did not request consent from students.

    We requested student demographic data for all students who consented to participate in the study in Courses 1 and 2, and all students in Course 3 from the registrar’s office. This institutional data did not allow students to pick a gender outside of the man and woman binary. We acknowledge that this might be an inaccurate representation of the gender of students in our study who might not conform to the gender binary. Much of institutional demographic data come from college applications such as the common app which still does not provide an option beyond the binary, although it has added a multiselect pronoun question recently (Jaschik, 2021). We also used institutional data for race/ethnicity which asked students two separate questions, one asking whether they identify as Hispanic/Latinx and another asking them to pick the race(s) with which they identify. It is important to note that race is a social construct that was invented to classify people into a social hierarchy (Roberts, 2011), and due to systemic racism, racial identity continues to have a significant impact on people’s lives (Alexander, 2012; Ward, 2016). We used gender, race/ethnicity, and proxies of socioeconomic status in this study to assess systemic inequities and ascribe any differences in outcomes we report to systems of oppression such as sexism, racism, and classism, and not any social identity per se.

    For Courses 1 and 2, we asked a few additional demographic questions such as the number of hours per week that students worked a job along with the survey that included the consent form for the study.

    We collected grades and demographic data from 792 students in Course 1, 635 students in Course 2, and 439 students in Course 3. The full demographic breakdown of students from whom we collected grades and demographic data is included in Table 2 and survey responses are included in Supplemental Table S1. As mentioned above, we were only able to include women and men in our analyses. Students who picked more than one race were categorized as multiracial for our analyses. Any students who identified as Hispanic/Latinx in the first question were categorized as Hispanic/Latinx for the rest of our analyses. We removed Native American/American Indian students and Native Hawaiian/Pacific Islander students due to small sample size. We also removed students whose race/ethnicity was unspecified in the registrar data and those who chose “decline to state” to the survey question about work hours per week in Courses 1 and 2. After removing these students, we had data from 749 students in Course 1, 614 students in Course 2, and 429 students in Course 3 that we used for analyses.

    TABLE 2. Demographics of students included in the analyses of exam scores

    Course 1Course 2Course 3
    Number in total sampleNumber in sample analyzedNumber in total sampleNumber in sample analyzedNumber in total sampleNumber in sample analyzed
    N = 792N = 749N = 635N = 565N = 439N = 429
    Gender
     Women540 (68%)512 (68%)438 (69%)387 (68%)302 (69%)295 (69%)
     Men252 (32%)237 (32%)197 (31%)178 (32%)137 (31%)134 (31%)
    Race/Ethnicity
     American Indian/Alaska Native9 (1.1%)5 (0.8%)2 (0.5%)
     Asian American108 (14%)105 (14%)100 (16%)92 (16%)80 (18%)80 (19%)
     Black/African American39 (4.9%)39 (5.2%)34 (5.4%)33 (5.8%)22 (5.0%)22 (5.1%)
     Hispanic/Latinx193 (24%)188 (25%)139 (22%)121 (21%)89 (20%)89 (21%)
     International26 (3.3%)26 (3.5%)24 (3.8%)23 (4.1%)8 (1.8%)
     Multiracial45 (5.7%)44 (5.9%)36 (5.7%)32 (5.7%)28 (6.4%)28 (6.5%)
     Native Hawaiian2 (0.3%)1 (0.2%)
     White353 (45%)347 (46%)281 (44%)264 (47%)210 (48%)210 (49%)
     Unspecified17 (2.1%)15 (2.4%)
    Pell-eligible264 (33%)250 (33%)204 (32%)176 (31%)138 (32%)
    Number of hours worked at a job
     No444 (56%)424 (57%)393 (62%)381 (67%)
     ≤20 h242 (31%)236 (32%)111 (17%)105 (19%)
     >20 h90 (11%)89 (12%)80 (13%)79 (14%)
     Decline to state16 (2.0%)51 (8%)

    In addition to the exam and demographic data, we sent short surveys to all students in Courses 1 and 2 after each exam retake opportunity to understand the reasons that students had for choosing to retake or not retake an exam. Because there is minimal published literature on student participation in optional exam retakes, we developed the survey based on our own experiences as students and instructors in undergraduate biology classrooms. We asked students an open-ended question about their decision to participate in the exam retake before asking a closed-ended question where they could select multiple options for choosing to retake or not retake an exam. Next, we asked students whether they think the retakes helped their learning and reduced their anxiety about exams in the course. Finally, we included some questions about student experiences with exam retakes such as whether they like retakes and whether they found it too time consuming to prepare for and retake exams (see supplemental materials for a copy of all survey questions). To check for cognitive validity of survey items, we conducted six think-aloud interviews with undergraduate students and iteratively revised survey items until no further changes were needed (Beatty and Willis, 2007).

    The surveys were distributed to all the students enrolled in Courses 1 and 2 via an email from the instructors and an announcement on the learning management system. Students were offered a small amount of extra credit for completing the survey and were explicitly told that their instructors would not see their responses to the survey. In Course 1, out of 1090 students, 457, 592, and 335 students filled out our survey after exam 1, 2, and 3, respectively, resulting in a response rate of 42%, 54%, and 31%, respectively. In Course 2, out of 846 students, 501, 431, and 428 students filled out our survey after exam 1, 2, and 3 respectively, resulting in a response rate of 59%, 51%, and 51%, respectively. The full set of analyzed survey questions is included in the supplement.

    Data Analysis.

    Because courses differed in important ways from each other, such as the number and timing of retakes, we analyzed data for each course separately.

    Analysis of Exam Data.

    We used logistic regressions to assess which variables influence the likelihood of a student retaking an exam. The outcome for these models was a two-column matrix indicating the number of exams a student retook and the number of exams they did not retake, and the predictor variables included total exam score on first attempts, gender, race/ethnicity, and Pell grant eligibility (a proxy for socioeconomic status). In addition, for Courses 1 and 2, we included the number of hours per week that students worked a job during the semester as a predictor. Because Course 3 data were archival in nature, we did not have information on this variable for students in that course.

    We used paired t tests to assess whether exam retakes significantly improved student scores overall. Next, to examine the effect of optional exam retakes on students with different social identities, we compared three regression models for each course.

    Model 1: In this model, total exam score received on first attempts was the outcome variable and gender, race/ethnicity, Pell-eligibility, and number of hours worked per week (only for Courses 1 and 2) were the predictors. This model allowed us to assess demographic disparities in exam scores between students with different social identities prior to exam retakes. Knowing that they had an opportunity to retake exams might have affected student preparation and behavior for the first exam, so these disparities might not be the same as what we would observe in a course without retakes. However, they still give us useful information on the magnitude of score disparities in these courses.

    Model 2: In this model, total exam score received after taking retakes into account was the outcome variable and total exam score on first attempts along with demographic variables were the predictors. This model shows us the effect of exam retakes on disparities in the exam score received by students with different social identities. If exam retakes benefit all students equally, we would expect that none of the demographic variables would have a significant effect in this model.

    Model 3: This model was the same as Model 2 but with the addition of the total number of retakes taken by a student added as a predictor. Comparing Model 2 with Model 3 allows us to understand the extent to which differences in participation in exam retakes, if any, shape the effects of exam retakes on students.

    The reference groups for all of our models were: women, white students, students who were not eligible for federal Pell grants, and students who did not work a job during the semester.

    We checked adequacy of all regression model fits using the check_model function in the R package performance (Lüdecke et al., 2020). We examined the plots for fitted values against residuals for linear regressions to check for linearity and fitted values against the square root of standardized residuals for both linear and logistic regressions to check for homogeneity of variance. We found that the lines were relatively flat and horizontal for all models for Courses 1 and 2, indicating that these assumptions were met. There was a slight curvature in the lines for models for Course 3, suggesting that the assumption for homogeneity of variance might not be met, so we encourage caution in interpreting our results from Course 3. There were no outliers in any of the models and the residuals were normally distributed. We checked that the variance inflation factors were low implying that the predictors were not collinear.

    Analysis of Survey Data.

    Within each course, we pooled data from all three surveys together. To assess whether there were demographic differences among students for the reasons for retaking or not retaking an exam, we coded each reason provided as an option on our survey as a binary variable. We also converted survey questions about student experiences with retakes that were originally on a Likert scale into a binary response. We then used these survey responses as the outcome and demographic variables as predictors. Because a student might have responded to more than one survey, we used a generalized linear mixed model approach for these analyses and treated the unique random ID assigned to each student as a random effect. Intraclass coefficient values were greater than 0.12 for all these models (with one exception where it was 0.077), indicating that there was appreciable within-student clustering of responses which justified the use of random effects in the models (Theobald, 2018). We used the R package lme4 (Bates et al., 2015) for analysis and the package performance for calculation of intraclass coefficients and checking model fits visually (Lüdecke et al., 2020). Some of these models with the survey data did not meet all the assumptions of generalized linear mixed models. As such, we view the survey data and accompanying analyses as exploratory.

    RESULTS

    Finding 1: There are Demographic Differences in Participation in Optional Exam Retakes

    In all three courses, students who scored higher on first attempts were less likely to retake exams compared with reference groups. Generally, student participation in retakes was higher in Course 2 where retakes were offered in class or online compared with Course 1 where retakes were offered outside of class time. There was a steep drop-off between retake rates for exam 1 compared with exams 2 and 3 in Course 2 which happened after the transition to remote learning due to COVID-19. In Course 3, where retakes were offered outside of class time but constituted 75% of course grade, participation in retakes ranged from 60–75% (Figure 1).

    FIGURE 1.

    FIGURE 1. Barplot showing the percentage of students that participated in each exam retake opportunity across the three courses.

    Asian American students, International students, and Pell-eligible students were more likely to retake exams than reference groups in Course 1 (Figure 2). However, students who worked a job and Hispanic/Latinx students were less likely to retake exams. Survey analyses showed that Asian American students and Pell-eligible students who retook exams were more likely to choose not being satisfied with their score in this course as the reason for retaking the exam (Table 3). This was true for Asian American students even after controlling for the score they received on the first attempt (Supplemental Table S2). The most common reason students chose for not retaking an exam was that they were satisfied with their score on the first attempt. If a student was not satisfied with their score, but still chose to not retake an exam, it might be because they experienced some barriers to retaking exams. Among students who did not retake exams, Pell-eligible students were less likely to choose being satisfied with their score in the exam as a reason for not retaking the exam (Table 4). Students who worked 1–20 h/wk were also less likely to choose this option even after controlling for the score they received on the first attempt, suggesting that their lower participation in retakes might be due to barriers to participation (Supplemental Table S3). Having retakes scheduled outside of class time could be a barrier to participation for students who work a job, have a heavy course load, or have other responsibilities.

    FIGURE 2.

    FIGURE 2. Logistic regression coefficients for total exam score on first attempts (i.e., “original exam score”) and demographic variables on students’ likelihood of retaking exams. The reference groups for all of our models were: women, White students, students who were not eligible for federal Pell grants, and students who did not work a job during the semester. Error bars represent standard error of the regression coefficients. Blue dots indicate positive effects and red dots indicate negative effects, gray line indicates no significant effect. *, P < 0.05; **, P < 0.01; ***, P < 0.001.

    TABLE 3. Percentage of survey responses where an option was selected by a student as their reason for retaking the exam and demographic differences in the likelihood of picking an option. Students were asked to select all the options that apply to them. We pooled data from all three surveys here. - indicates that we did not run linear mixed effects for those because very few students (less than 10%) picked that option across all surveys in both courses. We accounted for multiple responses from the same student by using student as a random effect with varying intercepts in our models. We did not control for student exam scores here, but include a similar table in the supplement that presents model results after controlling for total score on first exam attempts

    Course 1Summary of demographic differencesCourse 2Summary of demographic differences
    % (n = 880)% (n = 1033)
    I was not satisfied with my score77.16Asian American students and Pell-eligible students more likely to pick86.66Latinx and students who work 1-20 h/wk more likely to pick
    I thought I could improve my score79.77Men and International students less likely to pick73.22Men, International students, and Pell-eligible students less likely to pick
    First attempt was encouraging19.89Black students and Pell-eligible students less likely to pick; men more likely to pick7.32No demographic differences
    To help learn the material better44.43No demographic differences48.24Men less likely to pick
    To practice my test-taking skills22.84No demographic differences20.16No demographic differences
    Because my friends were retaking5.454.41
    To impress my instructor1.932.51

    TABLE 4. Percentage of survey responses where an option was picked by a student as their reason for not retaking the exam and demographic differences in the likelihood of picking an option. Students were asked to select all the options that apply to them. We pooled data from all three surveys here. - indicates that we did not run linear mixed effects for those because very few students (less than 10%) picked that option across all surveys in both courses. n/a indicates options that were not applicable to the course. We accounted for multiple responses from the same student by using student as a random effect with varying intercepts in our models. We did not control for student exam scores here, but include a similar table in the supplement that presents model results after controlling for total score on first exam attempts

    Course 1NotesCourse 2Notes
    % (n = 447)% (n = 261)
    I was satisfied with my score75.17Pell-eligible students pick less often58.54No demographic differences
    I was discouraged by my score4.019.35
    I didn’t think I could improve my score34.9Men less likely to pick28.05Latinx students less likely to pick
    Taking exams makes me anxious12.3No demographic differences13.41No demographic differences
    Too difficult to come to campus for retake13.87No demographic differencesn/a
    Signing up process was difficult2.88n/a
    Scheduling conflicts26.52No demographic differencesn/a
    Too busy19.24No demographic differences33.74Pell-eligible students were less likely to pick
    Didn't feel like it10.74Students who work 1-20 h/wk less likely to pick10.16Pell-eligible students were more likely to pick
    Planned to retake, but forgot5.82n/a

    In Course 2, where retakes were scheduled during class time or administered online, students who worked were still less likely to retake exams in this course (Figure 2). Moreover, Black/African American students were significantly less likely to retake exams in this course. Neither students who worked nor Black/African American students were more likely to pick any of the reasons listed in the survey for not participating in exam retakes (Table 4).

    In Course 3, Asian American students were more likely to retake exams and men were less likely to retake exams.

    Finding 2: On Average, Students Score Higher on the Retakes Compared with First Attempts

    Overall, students scored higher on retakes compared with the first attempts (paired t tests, all P < 0.05, except for exam 1, Course 1) (Figure 3). The average difference in total exam score on first attempts and total exam score students received in the course after taking retakes into account was 17.6 points out of 390 (4.5 percentage points) in Course 1, 26.4 points out of 360 (7.3 percentage points) in Course 2, and 38.1 points out of 300 (12.7 percentage points) in Course 3. Thus, optional exams increased student score significantly in all three courses.

    FIGURE 3.

    FIGURE 3. Boxplot of student score on first attempt and retakes. The black bar in the middle of the box represents the median, the lower and upper hinges of the box represent the first and third quartiles of the data. The upper whisker represents values between the upper hinge and the largest value within 1.5 × interquartile range (interquartile range is the distance between first and third quartiles). The lower whisker represents values between the lower hinge and the smallest value within 1.5 × interquartile range. (all except Course 1, Exam 1 paired t test p < 0.05).

    Finding 3: Optional Exam Retakes Increased Score Disparities Between Students with Different Social Identities, Likely Due to Differences in Participation in Retake

    Pell-eligible students received lower total scores on first exam attempts compared with students who were not eligible for Pell grants in all three courses. Black and Latinx students received lower total scores on first exam attempts compared with White students, although this was statistically significant for Black students only in Courses 1 and 2, and for Latinx students only in Courses 2 and 3 (Figure 4, A–C). International students received lower scores on first attempts than white students in Course 1. We only had data about the number of hours students work a job for Courses 1 and 2 and in both we saw that these students, especially those working more than 20 h a week, received a lower score than students who did not work. In Course 2, students who worked less than 20 h a week also received a significantly lower score than students who did not work a job (Figure 4, A and B).

    FIGURE 4.

    FIGURE 4. Standardized slope estimates from multiple linear regression models for total exam score on first attempts (a–c) or total exam score after retakes (d–i) for Course 1 (a, d, g), Course 2 (b, e, h) and Course 3 (c, f, i). Larger absolute values of estimates indicate larger impact and the ± signs indicate the directionality of the relationship (positive relationship with the outcome vs. negative relationship with the outcome) (a–c, Model 1) Disparities in total exam score on first attempts by gender, race/ethnicity, Pell-eligibility, and number of hours worked. (d–f, Model 2) Effect of total score on first attempts and demographic variables on total score after exam retakes. Pre-score is the total score on the initial exams scaled to a mean of 0 and SD of 1. (g–i, Model 3) Effect of total score on first attempts, participation in retakes, and demographic variables on total score after exam retakes. Error bars represent standard error of the regression coefficients. Blue dots indicate positive effect and red dots indicate negative effect, gray line indicates no significant effect. *, P < 0.05; **, P < 0.01; ***, P < 0.001.

    Optional exam retakes did not reduce these demographic disparities in exam scores, but instead exacerbated some of the inequities in student exam scores. For example, the regression coefficient estimate for Black/African American students in Model 2 for both Courses 1 and 3 was negative, indicating that the disparity in total exam scores between Black and white students increased after optional exam retakes (Figure 4, D and F). Course 2 results showed that retakes also increased the score disparity between students who worked a job during the semester and students who did not (Figure 4E). Differences in participation in exam retakes explained these results to a large extent as seen in Model 3, which includes student participation in retakes as a predictor (Figure 4, G–I). Most of the significant differences that we saw in Model 2 (Figure 4, D and F) were no longer significant once we took student participation in retakes into account (Figure 4, G–I). However, even after accounting for student participation in retakes, there were some differences in scores that remained (Figure 4G).

    Finding 4: Students Prefer Optional Exam Retakes and Having a Retake Opportunity Reduces Anxiety on the Initial Test Similarly for Students with All Social Identities, but Some Issues Remain

    Overall, exam retakes were very popular with students and most students, regardless of whether they retook an exam, perceived that exam retakes help their learning. Moreover, about 86% of students who retook the exam said that they put a lot of effort into studying for the retake, while only 20–30% of students said that they studied less for the initial exam.

    Most students who retook an exam agreed that retakes reduced their anxiety about taking tests. However, 49–56% of students indicated that they felt anxious while retaking the exams. This was more often true for Asian American students in both Course 1 and Course 2, and students who worked more than 20 h in Course 1. While most students agreed that having the opportunity to retake exams helped them stay calmer on the initial exam, Latinx students were less likely to agree with this in Course 1. Moreover, about a quarter of students indicated that they did not retake the exam because of test anxiety. Thus, it seems that exams remain a high-stakes situation for many students, even when optional exam retakes are offered (Table 5).

    TABLE 5. Percentage of survey responses that indicated agreement with a statement and demographic differences in the likelihood of agreement. We pooled data from all three surveys here. - indicates that we did not run linear mixed effects for those because very few or too many students (less than 10% or more than 90%) picked that option across all surveys in both courses. n/a indicates options that were not applicable to the Course. We accounted for multiple responses from the same student by using student as a random effect with varying intercepts in our models

    Course 1NotesCourse 2Notes
    Students who retook exams% (n = 880)% (n = 1033)
    Retaking this exam helped my learning96.794.68
    Retaking reduced my anxiety about taking tests81.48Men more likely to agree83.85No demographic differences
    Put a lot of effort into studying for retake86.36No demographic differences85.86International students less likely to agree
    Anxious while retaking48.98Asian American students and students who work >20 h/wk more likely to agree; men less likely to agree56.24Asian American students more likely to agree; men less likely to agree
    Preparing and retaking was too time consuming22.16No demographic differences25.78No demographic differences
    Finding the time was challenging20.22Asian American, Latinx & International students and students who work >20 h/w more likely to agreen/an/a
    Students who did not retake exams% (n = 447)% (n = 261)
    Retaking this exam would not have helped my learning23.04Men were more likely to agree8.89No demographic differences
    I did not retake because of test anxiety23.49No demographic differences30.08No demographic differences
    Appreciate opportunity to retake exams98.8897.97
    All students% (n = 1337)% (n = 1290)
    Lower anxiety on initial exam89.3No demographic differences91.53No demographic differences
    Stayed calmer on initial exam81.97Latinx students less likely to agree86.92No demographic differences
    Studied less for initial exam20.19International students and students who did not retake exam less likely to agree32.12Students who did not retake exam less likely to agree
    Like exam retakes99.3398.63
    Prefer regular hours43.08Students who did not retake exam, International students, and students who work 1-20 h/wk more likely to agreen/an/a

    In Course 1, where retakes were scheduled outside of class, having the retakes during class time was preferred by 43% of students, especially those who did not retake an exam, worked a job, and/or were International students. Asian American, Latinx, International students, and students who work more than 20 h a week were more likely to agree that finding the time to retake the exam was challenging in this course. Even when the retakes were scheduled in class/online in Course 2, a little more than 20% of students agreed that preparing and retaking the exam was too time consuming (Table 5).

    DISCUSSION

    Our results show that optional exam retakes might be a useful tool to improve student performance and learning. Moreover, they might reduce student anxiety on the first exam attempt. However, we observed demographic differences in participation in exam retakes, especially when retakes were offered outside of class time. Specifically, in courses where retakes were outside of class time, Black students and students who worked a job were less likely to retake exams even after controlling for their score on the first exam attempt. Additionally, retakes seem to exacerbate the disparities in scores received by students with different social identities. This was likely explained by differences in participation in exam retakes. Our results highlight the importance of thinking about accessibility in a broad sense for any instructional practice such as, in this instance, offering retakes in a way that ensures all students are able to participate.

    There are several factors that might shape students’ decisions to participate in an optional intervention such as the optional exam retakes we studied. First, students might not have time on their hands to participate in an optional intervention. Diversity among college students has been increasing over the past couple of decades. With that, the proportion of students who work and those who have caregiving responsibilities has also increased (Bowen et al., 2009; Goldrick-Rab and Sorensen, 2010; Goldrick-Rab, 2016). Employment rate among 18- to 22-year-old full-time college students increased from 33% in 1970 to 52% in 2000 (Scott-Clayton, 2012) and was at 43% in 2018 (National Center for Education Statistics, 2019). In addition, 22% of all undergraduate students are parents and 42% of parenting students are single mothers (Cruse et al., 2019). According to a recent survey of parenting students, 60% of them were working a job and another 13% were not working but were looking for jobs (Goldrick-Rab et al., 2020). Many other students also might have caregiving responsibilities for other family members such as siblings and grandparents. Overall, many undergraduate students, particularly women, students of color, and socioeconomically disadvantaged students, have significant limits on the amount of time and availability at a given time for coursework. Second, students might have scheduling constraints that might prevent them from participating in an optional intervention that is offered outside of class time with limited day and time options. Lastly, students might not be motivated to participate in optional interventions or might be hesitant for a variety of reasons such as lower interest in the course material, test anxiety, and stereotype threat.

    Given that a large proportion of students need to balance taking courses with other responsibilities such as work and caregiving and our findings of discrepancies in who takes optional retakes outside of class, we argue that opportunities to improve grades such as optional exam retakes need to be offered during formal class time. This would eliminate any scheduling barriers. Comparing the participation rates in retakes across the three courses, we found that a much larger proportion of students retook exams in Course 2, when exams were offered during class time or online compared with Courses 1 and 3, where exams were offered outside of class time. Moreover, as our survey results show, students who worked a job were more likely to agree that they would have preferred exam retakes during class time.

    Although offering exam retakes during class time can help make optional exam retakes more equitable, it still is not enough to overcome the barriers students with limited time face because effectively retaking exams involves preparing for exams outside of class. Thus, even when retakes were offered during class time in Course 2, some disparities in participation and benefits from retakes persisted. This might be due to the fact that this course was taught during the spring semester of 2020, when the public health crisis caused by the COVID-19 pandemic forced widespread shutdowns and a rapid transition to remote learning. While the pandemic adversely affected nearly all college students, it had disparate impacts on students based on their social positions (Aucejo et al., 2020; Gelles et al., 2020; Supriya et al., 2021). For example, in an interview study of engineering students, Gelles et al. (2020) found that men described having more free time in Spring 2020 while women described having to take on more domestic responsibilities. In a survey study, students of color and, to some extent, lower-income students reported a larger reduction in weekly study hours compared with White students and higher-income students after the transition to remote learning in Spring 2020 (Aucejo et al., 2020). Such differential impacts of the pandemic could explain the differences in participation in retakes among students in this course. However, as described earlier, there are systematic differences in the amount of time that students have for coursework even in the absence of a global pandemic.

    Despite these disparities in participation, optional exam retakes benefitted student learning and were very popular with students. Moreover, almost 90% of students said that having the opportunity to retake exams reduced their anxiety on the initial exam. Therefore, finding ways to offer retakes that reduce test anxiety experienced by students and retain the benefits to student learning could be beneficial. In addition to offering retakes during class time as described earlier, one approach could be to make exam retakes mandatory for students who received a low score on the first attempt as done in some mastery learning approaches (Diegelman-Parente, 2011). However, this runs the risk of exacerbating the test anxiety that many students experience while taking high-stakes exams. We noticed that for exam 1 of Course 2 which took place in person prior to the COVID-19 shutdowns, Black students retook exams less often than White students even though retakes were offered during class time (Supplemental Figure S2). A potential explanation for this is that Black students were experiencing stereotype threat in these high-stakes assessment settings due to the fear of confirming negative stereotypes about Black students in STEM (Steele and Aronson, 1995; Aronson et al., 1998). This might cause Black students to avoid high-stakes assessment settings, resulting in the lower participation rates observed in our study.

    While we have not seen studies that examine demographic differences in participation and outcomes for high-stakes optional interventions such as the optional exam retakes studied here, there is some work examining the impact of extra credit assignments that are also optional in nature. These studies report that report that students with higher grades are more likely to complete extra credit assignments (Hardy, 2002; Silva and Gross, 2004; Moore, 2005; Harrison et al., 2011). By contrast in our study, higher scoring students were less likely to retake exams than lower scoring students. Higher stakes associated with exams compared with extra credit, and the fact that higher scoring students had less to gain in terms of exam points from retaking might explain this result.

    It is also important to note here that while many students reported that having an opportunity to retake the exam reduced their anxiety on the initial exam, about half of the students agreed that they felt anxious while retaking the exam. Further, a quarter of students agreed that they chose to not retake the exam due to test anxiety. There were also some demographic differences in the impact of exam retakes on student anxiety around test taking. Asian American students were more likely to agree that they felt anxious during the retake. This might be due to these students experiencing the pressures of conforming to the “model minority” stereotype (Poon et al., 2016; McGee et al., 2017). Latinx students were less likely to agree that the exam retake opportunity helped them stay calmer on the initial exam. It is possible that Latinx students were also experiencing stereotype threat in the high-stakes assessment settings (Gonzales et al., 2002; Nguyen and Ryan, 2008). These results suggest that the high-stakes settings of these exams are another important barrier to equitable participation in and outcomes of optional exam retakes.

    Two additional evidence-based ways to encourage mastery learning are frequent low-stakes assessments and two-stage exams, which we discuss below.

    Several studies show that frequent low-stakes assessments lead to more equitable grade distributions among students. A recent meta-analysis showed a positive association between frequent low-stakes assessments and students’ overall course performance and likelihood of passing a course (Sotola and Crede, 2020). Such positive associations have also been observed in undergraduate introductory biology courses. For example, Eddy and Hogan (2014) reported that moderate course structure (i.e., one graded assignment per week) reduced grade disparities between Black students and White students and between first-generation to college and continuing-generation students in an introductory biology course. Cotner and Ballen (2017) reported smaller gender gaps in courses where exams constituted a smaller proportion of the overall course grade in introductory biology courses. Such frequent low-stakes assessments, especially when they take the form of quizzes, offer one way to enhance student learning and retention of course material through the testing effect. In fact, Hinze and Rapp (2014) reported that low-stakes assessments lead to higher long-term retention of science content, but high-stakes assessment did not lead to higher long-term retention. Thus, opportunities for retaking low-stakes quizzes might be more effective for student learning than high-stakes exams.

    Another way to encourage mastery learning among students might be to use two-stage exams. Two-stage exam is a method of assessment where students complete an exam individually, then retake the exam with a group of peers (Yuretich et al., 2001). Several studies report benefits of two-stage testing including significantly higher scores on the collaborative assessment compared with independent assessment scores (Bruno et al., 2017; Levy et al., 2018) and higher scores in semesters or topics with collaborative testing as the second stage compared with individual retest (Gilley and Clarkston, 2014; Knierim et al., 2015). Moreover, group scores were found to be higher than the scores of any individual within the group demonstrating collaborative learning in geology and oceanography courses (Bruno et al., 2017). Collaborative testing has also been found to improve content retention within some undergraduate college courses (Gilley and Clarkston, 2014; Knierim et al., 2015; Deng and Luo, 2018) including introductory biology courses (Cooke et al., 2019), although other studies did not find evidence for greater content retention (Leight et al., 2012). Finally, studies show that students find the collaborative testing in two-stage exams more helpful and less stressful than traditional assessments (Yuretich et al., 2001; Leight et al., 2012; Rieger and Heiner, 2014; Levy et al., 2018). Future studies should examine whether two-stage exams benefit students with all social identities equitably and how that compares with optional individual exam retakes.

    Ultimately, there might not be one solution or intervention that works equally well for all students. Short interventions such as values affirmation exercises (Harackiewicz et al., 2014; Jordt et al., 2017), and expressive writing exercises (Ramirez and Beilock, 2011; Harris et al., 2019) might help us move toward more equitable learning and assessment (Casad et al., 2018). However, eventually we might need to make more large-scale changes, such as providing students with multiple ways to demonstrate their learning as suggested by Culturally Relevant Pedagogy (Ladson-Billings, 1995) and Universal Design for Learning (Rose et al., 2006) frameworks to achieve equitable learning and assessment in STEM classrooms.

    Limitations

    This study was done across multiple introductory biology courses at a single institution, so generalizations should be made with caution. Various factors, such as demographics of the institution, social identities of the instructional teams, instructor experience, and ability to create an inclusive classroom, could all influence the impact of an intervention such as optional exam retakes. One of the semesters that we included in this study was the Spring 2020 semester when colleges were forced to transition from in-person to fully online instruction due to the COVID-19 pandemic. The COVID-19 pandemic also affected people’s lives in profound ways and thus patterns of student participation in retakes that semester might also have been affected by the pandemic. Another limitation of our study is that although we asked students about the effect of optional exam retakes on their test anxiety levels, we did not directly measure test anxiety. Because of IRB constraints, we were also unable to include any “control groups,” that is, sections of the same course taught by the same instructor but without optional exam retakes. Other limitations include using a coarse measure of socioeconomic status, that is, eligibility for federal Pell grants. More fine-scale data on socioeconomic status and basic needs insecurities among our student population could have provided additional important insights into patterns of student participation and performance on exam retakes. Due to sample size limitations, we were unable to take an intersectional approach in our analyses and assess how systems of oppression such as racism, sexism, and classism, might interact to shape students’ participation and performance in exam retakes. Future studies using a qualitative intersectional approach would be very insightful.

    CONCLUSIONS

    Taken together, our results show that optional exam retakes could be effective in improving student exam scores, improving learning, and might alleviate student anxiety about high-stakes exams to some extent. However, we found significant demographic differences in participation in exam retakes underscoring the importance of paying attention to accessibility for any instructional intervention. Our work also illustrates the importance of monitoring participation and examining the impact of instructional interventions across demographic groups during and after courses. Students with more socioeconomic privilege might benefit more from interventions that are optional and require time outside of class. Thus, if an intervention is intended to increase equity in student grades and learning, measures to ensure more equitable participation are required.

    FOOTNOTES

    1 MCAT is the exam that prospective medical students take for admissions into medical schools in the United States.

    ACKNOWLEDGMENTS

    We would like to thank Michael Angilletta and Josh Caulkins for their help in administering the surveys included in this study. We are also grateful to the students who filled out our survey. Members of the Biology Education Research lab at Arizona State University gave us valuable feedback on our study design and analyses, and we specifically thank Rachel Scott for her feedback on the manuscript. We are also very grateful to the two anonymous reviewers whose comments greatly improved the paper. Lastly, we want to thank the undergraduate researchers who participated in our think-aloud interviews to establish cognitive validity of our survey instruments. This work was supported by grant no. GT11046 from the Howard Hughes Medical Institute (www.hhmi.org) and grant no. 1711272 from the National Science Foundation (www.nsf.gov). Any findings, and recommendations expressed in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

    REFERENCES

  • Abraham, P. (2000). A microscopic look at assessment: Dropping a lowest test score versus allowing a retake test. AURCO Mathematics Journal, 1, Retrieved May 19, 2021, from https://eric.ed.gov/?id=ED466957 Google Scholar
  • Adesope, O. O., Trevisan, D. A., & Sundararajan, N. (2017). Rethinking the use of tests: A meta-analysis of practice testing. Review of Educational Research, 87(3), 659–701. https://doi.org/10.3102/0034654316689306 Google Scholar
  • Alexander, M. (2012). The New Jim Crow: Mass Incarceration in the Age of Colorblindness. New York, NY: The New Press. Google Scholar
  • Aronson, J., Quinn, D. M., & Spencer, S. J. (1998). 4—Stereotype threat and the academic underperformance of minorities and women. In: Prejudice, ed. J. K. SwimC. Stangor, San Diego, CA: Academic Press, 83–103. https://doi.org/10.1016/B978-012679130-3/50039-9 Google Scholar
  • Asai, D. J. (2020). Race matters. Cell, 181(4), 754–757. https://doi.org/10.1016/j.cell.2020.03.044 MedlineGoogle Scholar
  • Astorne-Figari, C., & Speer, J. D. (2019). Are changes of major major changes? The roles of grades, gender, and preferences in college major switching. Economics of Education Review, 70, 75–93. https://doi.org/10.1016/j.econedurev.2019.03.005 Google Scholar
  • Aucejo, E. M., French, J., Araya, M. P. U., & Zafar, B. (2020). The impact of COVID-19 on student experiences and expectations: Evidence from a survey. Journal of Public Economics, 191, 104271. https://doi.org/10.1016/j.jpubeco.2020.104271 MedlineGoogle Scholar
  • Badawy, A. A., Ibrahim, M., & Benson, S. (2016). Let there be hope: Assessing the implications of exam re-taking on student learning outcomes and grades of engineering students grounded on metacognition awareness framework. 2016 International Conference on Computational Science and Computational Intelligence (CSCI), 270–275. https://doi.org/10.1109/CSCI.2016.0059 Google Scholar
  • Ballen, C. J., Aguillon, S. M., Brunelli, R., Drake, A. G., Wassenberg, D., Weiss, S. L., ... & Cotner, S. (2018). Do small classes in higher education reduce performance gaps in STEM? BioScience, 68(8), 593–600. https://doi.org/10.1093/biosci/biy056 Google Scholar
  • Ballen, C. J., Salehi, S., & Cotner, S. (2017). Exams disadvantage women in introductory biology. PLoS One, 12(10), e0186419. https://doi.org/10.1371/journal.pone.0186419 MedlineGoogle Scholar
  • Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review, 84(2), 191–215. https://doi.org/10.1037/0033-295X.84.2.191 MedlineGoogle Scholar
  • Bates, D., Maechler, M., Bolker, B. M., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. https://doi.org/10.18637/jss.v067.i01 Google Scholar
  • Beatty, P. C., & Willis, G. B. (2007). Research synthesis: The practice of cognitive interviewing. Public Opinion Quarterly, 71(2), 287–311. https://doi.org/10.1093/poq/nfm006 Google Scholar
  • Beilock, S. L., Gunderson, E. A., Ramirez, G., & Levine, S. C. (2010). Female teachers’ math anxiety affects girls’ math achievement. Proceedings of the National Academy of Sciences, 107(5), 1860–1863. https://doi.org/10.1073/pnas.0910967107 MedlineGoogle Scholar
  • Bieg, M., Goetz, T., Wolter, I., & Hall, N. C. (2015). Gender stereotype endorsement differentially predicts girls’ and boys’ trait-state discrepancy in math anxiety. Frontiers in Psychology, 6, 1404. https://doi.org/10.3389/fpsyg.2015.01404 MedlineGoogle Scholar
  • Bowen, W. G., Chingos, M. M., & McPherson, M. (2009). Crossing the finish line. In: Crossing the Finish Line. Princeton, NJ: Princeton University Press. Retrieved September 7, 2021, from https://www.degruyter.com/document/doi/10.1515/9781400831463/html Google Scholar
  • Brame, C. J., & Biel, R. (2015). Test-enhanced learning: The potential for testing to promote greater learning in undergraduate science courses. CBE—Life Sciences Education, 14(2), es4. https://doi.org/10.1187/cbe.14-11-0208 LinkGoogle Scholar
  • Bruno, B., Engels, J., Ito, G., Gillis-Davis, J., Dulai, H., Carter, G., ... & Böttjer-Wilson, D. (2017). Two-stage exams: A powerful tool for reducing the achievement gap in undergraduate oceanography and geology classes. Oceanography, 30(2), 198–208. https://doi.org/10.5670/oceanog.2017.241 Google Scholar
  • Casad, B. J., Oyler, D. L., Sullivan, E. T., McClellan, E. M., Tierney, D. N., Anderson, D. A., ... & Flammang, B. J. (2018). Wise psychological interventions to improve gender and racial equality in STEM. Group Processes & Intergroup Relations, 21(5), 767–787. https://doi.org/10.1177/1368430218767034 Google Scholar
  • Cates, W. M. (1982). The efficacy of retesting in relation to improved test performance of college undergraduates. The Journal of Educational Research, 75(4), 230–236. Google Scholar
  • Clark, T. M., Callam, C. S., Paul, N. M., Stoltzfus, M. W., & Turner, D. (2020). Testing in the time of COVID-19: A sudden transition to unproctored online exams. Journal of Chemical Education, 97, 3413–3417. https://doi.org/10.1021/acs.jchemed.0c00546 Google Scholar
  • Cooke, J. E., Weir, L., & Clarkston, B. (2019). Retention following two-stage collaborative exams depends on timing and student performance. CBE—Life Sciences Education, 18(2), ar12. https://doi.org/10.1187/cbe.17-07-0137 LinkGoogle Scholar
  • Cotner, S., & Ballen, C. J. (2017). Can mixed assessment methods make biology classes more equitable? PLoS One, 12(12), e0189610. https://doi.org/10.1371/journal.pone.0189610 MedlineGoogle Scholar
  • Crowe, A., Dirks, C., & Wenderoth, M. P. (2008). Biology in bloom: Implementing bloom’s taxonomy to enhance student learning in biology. CBE—Life Sciences Education, 7(4), 368–381. https://doi.org/10.1187/cbe.08-05-0024 LinkGoogle Scholar
  • Cruse, L. R., Holtzman, T., Gault, B., Croom, D., & Polk, P. (2019). Parents in college by the numbers. Institute for Women’s Policy Research 2020. Retrieved April 26, 2021, from https://iwpr.org/iwpr-issues/student-parent-success-initiative/parents-in-college-by-the-numbers/ Google Scholar
  • Davidson, W. B., House, W. J., & Boyd, T. L. (1984). A test-retest policy for introductory psychology courses. Teaching of Psychology, 11(3), 182–184. https://doi.org/10.1177/009862838401100320 Google Scholar
  • Deng, Q., & Luo, X. (2018). PipE2: An innovative pipelining design for collaborative two-stage exams. Proceedings of the 19th Annual SIG Conference on Information Technology Education, 38–43. https://doi.org/10.1145/3241815.3241850 Google Scholar
  • Diegelman-Parente, A. (2011). The use of mastery learning with competency-based grading in an organic chemistry course. Journal of College Science Teaching, 40(5), 50–58. Google Scholar
  • Dobson, J. L. (2008). The use of formative online quizzes to enhance class preparation and scores on summative exams. Advances in Physiology Education, 32(4), 297–302. https://doi.org/10.1152/advan.90162.2008 MedlineGoogle Scholar
  • Eagan, K., Herrera, F., Sharkness, J., Hurtado, S., & Chang, M. (2011). Crashing the gate: Identifying alternative measures of student learning in introductory science, technology, engineering, and mathematics courses. Presented at the Annual Meeting of the American Educational Research Association, New Orleans, LA. Google Scholar
  • Eccles, J. S., Adler, T. F., Futterman, R., Goff, S. B., Kaczala, C. M., Meece, J. L., & Midgley, C. (1983). Expectancies, values, and academic behaviors. In Spence, J. T. (Ed.), Achievement and Achievement Motivation (pp. 75–146). San Francisco, CA: W.H. Freeman. Google Scholar
  • Eddy, S. L., Brownell, S. E., & Wenderoth, M. P. (2014). Gender gaps in achievement and participation in multiple introductory biology classrooms. CBE—Life Sciences Education, 13(3), 478–492. https://doi.org/10.1187/cbe.13-10-0204 LinkGoogle Scholar
  • Eddy, S. L., & Hogan, K. A. (2014). Getting under the hood: How and for whom does increasing course structure work? CBE—Life Sciences Education, 13(3), 453–468. https://doi.org/10.1187/cbe.14-03-0050 LinkGoogle Scholar
  • Fichten, C., & Adler, L. (1977). Examination retest procedures: Effects on performance, test anxiety and attitudes. Improving College and University Teaching, 25(4), 247–250. https://doi.org/10.1080/00193089.1977.9927496 Google Scholar
  • Franklin, J., & Theall, M. (1992). Disciplinary differences: Instructional goals and activities, measures of student performance, and student ratings of instruction. Retrieved February 18, 2021, from https://eric.ed.gov/?id=ED346786 Google Scholar
  • Freeman, S., Eddy, S. L., McDonough, M., Smith, M. K., Okoroafor, N., Jordt, H., & Wenderoth, M. P. (2014). Active learning increases student performance in science, engineering, and mathematics. Proceedings of the National Academy of Science, 111, 8410–8415. https://doi.org/10.1073/pnas.1319030111 MedlineGoogle Scholar
  • Friedman, H. (1987). Repeat examinations in introductory statistics courses. Teaching of Psychology, 14(1), 20–23. https://doi.org/10.1207/s15328023top1401_4 Google Scholar
  • Furner, J. M., & Gonzalez-DeHass, A. (2011). How do students’ mastery and performance goals relate to math anxiety? Eurasia Journal of Mathematics, Science and Technology Education, 7(4), 227–242. https://doi.org/10.12973/ejmste/75209 Google Scholar
  • Gelles, L. A., Lord, S. M., Hoople, G. D., Chen, D. A., & Mejia, J. A. (2020). Compassionate flexibility and self-discipline: Student adaptation to emergency remote teaching in an integrated engineering energy course during COVID-19. Education Sciences, 10(11), 304. https://doi.org/10.3390/educsci10110304 Google Scholar
  • Gerwing, T. G., Rash, J. A., Gerwing, Allen, A., M., Bramble, B., & Landine, J. (2015). Perceptions and incidence of test anxiety. Canadian Journal for the Scholarship of Teaching and Learning, 6(3). Retrieved March 2, 2021, from https://eric.ed.gov/?id=EJ1084598 Google Scholar
  • Gilley, B. H., & Clarkston, B. (2014). Collaborative testing: Evidence of learning in a controlled in-class study of undergraduate students. Journal of College Science Teaching, 43(3), 83–91. Google Scholar
  • Gin, L. E., Guerrero, F. A., Brownell, S. E., & Cooper, K. M. (2021). COVID-19 and undergraduates with disabilities: Challenges resulting from the rapid transition to online course delivery for students with disabilities in undergraduate STEM at large-enrollment institutions. CBE—Life Sciences Education, 20(3), ar36. https://doi.org/10.1187/cbe.21-02-0028 LinkGoogle Scholar
  • Goldrick-Rab, S. (2016). Paying the price. Chicago, IL: University of Chicago Press. Retrieved September 8, 2021, from https://press.uchicago.edu/ucp/books/book/chicago/P/bo24663096.html Google Scholar
  • Goldrick-Rab, S., Coca, V., Kienzl, G., Welton, C., Dahl, S., & Magnelia, S. (2020). #RealCollege during the pandemic: New evidence on basic needs insecurity and student well-being. Rebuilding the Launchpad: Serving Students During Covid Resource Library, 1–23. Google Scholar
  • Goldrick-Rab, S., & Sorensen, K. (2010). Unmarried parents in college. The Future of Children, 20(2), 179–203. MedlineGoogle Scholar
  • Gonzales, P. M., Blanton, H., & Williams, K. J. (2002). The effects of stereotype threat and double-minority status on the test performance of Latino women. Personality and Social Psychology Bulletin, 28(5), 659–670. https://doi.org/10.1177/0146167202288010 Google Scholar
  • Harackiewicz, J. M., Canning, E. A., Tibbetts, Y., Giffen, C. J., Blair, S. S., Rouse, D. I., & Hyde, J. S. (2014). Closing the social class achievement gap for first-generation students in undergraduate biology. Journal of Educational Psychology, 106(2), 375–389. https://doi.org/10.1037/a0034679 MedlineGoogle Scholar
  • Hardy, M. (2002). Extra credit: Gifts for the gifted? Teaching of Psychology, 29(3), 233–235. Google Scholar
  • Harris, R. B., Grunspan, D. Z., Pelch, M. A., Fernandes, G., Ramirez, G., & Freeman, S. (2019). Can test anxiety interventions alleviate a gender gap in an undergraduate STEM course? CBE—Life Sciences Education, 18(3), ar35. https://doi.org/10.1187/cbe.18-05-0083 LinkGoogle Scholar
  • Harrison, M. A., Meister, D. G., & Lefevre, A. J. (2011). Which students complete extra-credit work? College Student Journal, 45(3), 550–555. Google Scholar
  • Hembree, R. (1988). Correlates, causes, effects, and treatment of test anxiety. Review of Educational Research, 58(1), 47–77. https://doi.org/10.3102/00346543058001047 Google Scholar
  • Herman, G. L., Cai, Z., Bretl, T., Zilles, C., & West, M. (2020). Comparison of grade replacement and weighted averages for second-chance exams. Proceedings of the 2020 ACM Conference on International Computing Education Research, 56–66. https://doi.org/10.1145/3372782.3406260 Google Scholar
  • Hinze, S. R., & Rapp, D. N. (2014). Retrieval (sometimes) enhances learning: Performance pressure reduces the benefits of retrieval practice. Applied Cognitive Psychology, 28(4), 597–606. https://doi.org/10.1002/acp.3032 Google Scholar
  • Jaschik, S. (2021). Common application adds questions for transgender students | Inside Higher Ed. Retrieved September 24, 2021, from https://www.insidehighered.com/admissions/article/2021/03/01/common-application-adds-questions-transgender-students Google Scholar
  • Jordt, H., Eddy, S. L., Brazil, R., Lau, I., Mann, C., Brownell, S. E., ... & Freeman, S. (2017). Values affirmation intervention reduces achievement gap between underrepresented minority and White students in introductory biology classes. CBE—Life Sciences Education, 16(3), ar41. https://doi.org/10.1187/cbe.16-12-0351 LinkGoogle Scholar
  • Juhler, S. M., Rech, J. F., From, S. G., & Brogan, M. M. (1998). The effect of optional retesting on college students’ achievement in an individualized algebra course. The Journal of Experimental Education, 66(2), 125–137. Google Scholar
  • Knierim, K., Turner, H., & Davis, R. K. (2015). Two-stage exams improve student learning in an introductory geology course: Logistics, attendance, and grades. Journal of Geoscience Education, 63(2), 157–164. https://doi.org/10.5408/14-051.1 Google Scholar
  • Kost, L. E., Pollock, S. J., & Finkelstein, N. D. (2009). Characterizing the gender gap in introductory physics. Physical Review Special Topics - Physics Education Research, 5(1), 010101. https://doi.org/10.1103/PhysRevSTPER.5.010101 Google Scholar
  • Ladson-Billings, G. (1995). Toward a theory of culturally relevant pedagogy. American Educational Research Journal, 32(3), 465–491. https://doi.org/10.3102/00028312032003465 Google Scholar
  • Lau, S., & Nie, Y. (2008). Interplay between personal goals and classroom goal structures in predicting student outcomes: A multilevel analysis of person-context interactions. Journal of Educational Psychology, 100(1), 15–29. https://doi.org/10.1037/0022-0663.100.1.15 Google Scholar
  • Leight, H., Saunders, C., Calkins, R., & Withers, M. (2012). Collaborative testing improves performance but not content retention in a large-enrollment introductory biology class. CBE—Life Sciences Education, 11(4), 392–401. https://doi.org/10.1187/cbe.12-04-0048 LinkGoogle Scholar
  • Levy, D., Svoronos, T., & Klinger, M. (2018). Two-stage examinations: Can examinations be more formative experiences? Active Learning in Higher Education, 24(2), 79–94. https://doi.org/10.1177/1469787418801668 Google Scholar
  • Lüdecke, D., Makowski, D., Waggoner, P., & Patil, I. (2020). Performance: Assessment of regression models performance. (0.4.7) [Computer software]. Retrieved July 15, 2020, from https://easystats.github.io/performance/ Google Scholar
  • Marsh, E. J., Roediger, H. L., Bjork, R. A., & Bjork, E. L. (2007). The memorial consequences of multiple-choice testing. Psychonomic Bulletin & Review, 14(2), 194–199. https://doi.org/10.3758/BF03194051 MedlineGoogle Scholar
  • Matz, R. L., Koester, B. P., Fiorini, S., Grom, G., Shepard, L., Stangor, C. G., ... & McKay, T. A. (2017). Patterns of gendered performance differences in large introductory courses at five research universities. AERA Open, 3(4), 2332858417743754. https://doi.org/10.1177/2332858417743754 Google Scholar
  • McDaniel, M. A., Wildman, K. M., & Anderson, J. L. (2012). Using quizzes to enhance summative-assessment performance in a web-based class: An experimental study. Journal of Applied Research in Memory and Cognition, 1(1), 18–26. https://doi.org/10.1016/j.jarmac.2011.10.001 Google Scholar
  • McGee, E. O., Thakore, B. K., & LaBlance, S. S. (2017). The burden of being “model”: Racialized experiences of Asian STEM college students. Journal of Diversity in Higher Education, 10(3), 253–270. https://doi.org/10.1037/dhe0000022 Google Scholar
  • Miyake, A., Kost-Smith, L. E., Finkelstein, N. D., Pollock, S. J., Cohen, G. L., & Ito, T. A. (2010). Reducing the gender achievement gap in college science: A classroom study of values affirmation. Science, 330(6008), 1234–1237. https://doi.org/10.1126/science.1195996 MedlineGoogle Scholar
  • Momsen, J. L., Long, T. M., Wyse, S. A., & Ebert-May, D. (2010). Just the facts? Introductory undergraduate biology courses focus on low-level cognitive skills. CBE—Life Sciences Education, 9(4), 435–440. https://doi.org/10.1187/cbe.10-01-0001 LinkGoogle Scholar
  • Moore, R. (2005). Who does extra-credit work in introductory science courses? Journal of College Science Teaching, 34(7), 12–15. Google Scholar
  • National Center for Education Statistics (2019). Digest of education statistics. Retrieved May 20, 2021, from https://nces.ed.gov/programs/digest/d19/tables/dt19_318.45.asp Google Scholar
  • National Science Foundation & National Center for Science and Engineering Statistics (2019). Women, minorities, and persons with disabilities in science and engineering: 2019. (Special Report NSF 19-304). Retrieved March 27, 2020, from https://www.nsf.gov/statistics/wmpd Google Scholar
  • Nguyen, H.-H. D., & Ryan, A. M. (2008). Does stereotype threat affect test performance of minorities and women? A meta-analysis of experimental evidence. Journal of Applied Psychology, 93(6), 1314–1334. https://doi.org/10.1037/a0012702 MedlineGoogle Scholar
  • Odom, S., Boso, H., Bowling, S., Brownell, S., Cotner, S., Creech, C., ... & Ballen, C. J. (2021). Meta-analysis of gender performance gaps in undergraduate natural science courses. CBE—Life Sciences Education, 20(3), ar40. https://doi.org/10.1187/cbe.20-11-0260 LinkGoogle Scholar
  • Orr, R., & Foster, S. (2013). Increasing student success using online quizzing in introductory (majors) biology. CBE—Life Sciences Education, 12(3), 509–514. https://doi.org/10.1187/cbe.12-10-0183 LinkGoogle Scholar
  • Patrick, B. C., Skinner, E. A., & Connell, J. P. (1993). What motivates children’s behavior and emotion? Joint effects of perceived control and autonomy in the academic domain. Journal of Personality and Social Psychology, 65(4), 781–791. https://doi.org/10.1037/0022-3514.65.4.781 MedlineGoogle Scholar
  • Pekrun, R. (1988). Anxiety and motivation in achievement settings: Towards a systems-theoretical approach. International Journal of Educational Research, 12(3), 307–323. https://doi.org/10.1016/0883-0355(88)90008-0 Google Scholar
  • Pekrun, R. (2006). The control-value theory of achievement emotions: Assumptions, corollaries, and implications for educational research and practice. Educational Psychology Review, 18(4), 315–341. https://doi.org/10.1007/s10648-006-9029-9 Google Scholar
  • Pekrun, R., Goetz, T., Frenzel, A. C., Barchfeld, P., & Perry, R. P. (2011). Measuring emotions in students’ learning and performance: The Achievement Emotions Questionnaire (AEQ). Contemporary Educational Psychology, 36(1), 36–48. https://doi.org/10.1016/j.cedpsych.2010.10.002 Google Scholar
  • Perry, R. P. (1991). Perceived control in college students: Implications for instruction in higher education. In Smart, J. (Ed.), Higher education: Handbook of theory and research (pp. 1–56). New York, NY: Agathon. Google Scholar
  • Poon, O., Squire, D., Kodama, C., Byrd, A., Chan, J., Manzano, L., ... & Bishundat, D. (2016). A critical review of the model minority myth in selected literature on Asian Americans and Pacific Islanders in higher education. Review of Educational Research, 86(2), 469–502. https://doi.org/10.3102/0034654315612205 Google Scholar
  • Putwain, D. W., Schmitz, E. A., Wood, P., & Pekrun, R. (2021). The role of achievement emotions in primary school mathematics: Control–value antecedents and achievement outcomes. British Journal of Educational Psychology, 91(1), 347–367. https://doi.org/10.1111/bjep.12367 MedlineGoogle Scholar
  • Ramirez, G., & Beilock, S. L. (2011). Writing about testing worries boosts exam performance in the classroom. Science, 331(6014), 211–213. https://doi.org/10.1126/science.1199427 MedlineGoogle Scholar
  • Rawson, K. A., & Dunlosky, J. (2012). When is practice testing most effective for improving the durability and efficiency of student learning? Educational Psychology Review, 24(3), 419–435. https://doi.org/10.1007/s10648-012-9203-1 Google Scholar
  • Richardson, J. T. E. (2015). Coursework versus examinations in end-of-module assessment: A literature review. Assessment & Evaluation in Higher Education, 40(3), 439–455. https://doi.org/10.1080/02602938.2014.919628 Google Scholar
  • Rieger, G. W., & Heiner, C. E. (2014). Examinations that support collaborative learning: The students’ perspective. Journal of College Science Teaching, 43(4), 41–47. Google Scholar
  • Ringeisen, T., Raufelder, D., Schnell, K., & Rohrmann, S. (2016). Validating the proposed structure of the relationships among test anxiety and its predictors based on control-value theory: Evidence for gender-specific patterns. Educational Psychology, 36(10), 1826–1844. https://doi.org/10.1080/01443410.2015.1072134 Google Scholar
  • Roberts, D. (2011). Fatal invention: How science, politics, and big business re-create race in the twenty-first century. New York, NY: New Press/ORIM. Google Scholar
  • Roediger, H. L., & Butler, A. C. (2011). The critical role of retrieval practice in long-term retention. Trends in Cognitive Sciences, 15(1), 20–27. https://doi.org/10.1016/j.tics.2010.09.003 MedlineGoogle Scholar
  • Roediger, H. L., & Karpicke, J. D. (2006). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1(3), 181–210. https://doi.org/10.1111/j.1745-6916.2006.00012.x MedlineGoogle Scholar
  • Roick, J., & Ringeisen, T. (2017). Self-efficacy, test anxiety, and academic success: A longitudinal validation. International Journal of Educational Research, 83, 84–93. https://doi.org/10.1016/j.ijer.2016.12.006 Google Scholar
  • Rose, D. H., Harbour, W. S., Johnston, C. S., Daley, S. G., & Abarbanell, L. (2006). Universal design for learning in postsecondary education: Reflections on principles and their application. Journal of Postsecondary Education and Disability, 19(2), 135–151. Google Scholar
  • Rowland, C. A. (2014). The effect of testing versus restudy on retention: A meta-analytic review of the testing effect. Psychological Bulletin, 140(6), 1432–1463. https://doi.org/10.1037/a0037559 MedlineGoogle Scholar
  • Salehi, S., Cotner, S., Azarin, S. M., Carlson, E. E., Driessen, M., Ferry, V. E., ... & Ballen, C. J. (2019). Gender performance gaps across different assessment methods and the underlying mechanisms: The case of incoming preparation and test anxiety. Frontiers in Education, 4, https://doi.org/10.3389/feduc.2019.00107 Google Scholar
  • Scott-Clayton, J. (2012). What explains trends in labor supply among U.S. undergraduates, 1970-2009? (Working Paper 17744). Cambridge, MA: National Bureau of Economic Research. https://doi.org/10.3386/w17744 Google Scholar
  • Seymour, E., & Hewitt, N. M. (1997). Talking about leaving: Why undergraduates leave the sciences. Boulder, CO: Westview Press. Google Scholar
  • Seymour, E., & Hunter, A.-B.( (2019). Talking about leaving revisited: Persistence, relocation, and loss in undergraduate STEM education. Cham, Switzerland: Springer International Publishing. https://doi.org/10.1007/978-3-030-25304-2 Google Scholar
  • Silva, F. J., & Gross, T. F. (2004). The rich get richer: Students’ discounting of hypothetical delayed rewards and real effortful extra credit. Psychonomic Bulletin & Review, 11(6), 1124–1128. https://doi.org/10.3758/BF03196747 MedlineGoogle Scholar
  • Simmons, A. B., & Heckler, A. F. (2020). Grades, grade component weighting, and demographic disparities in introductory physics. Physical Review Physics Education Research, 16(2), 020125. https://doi.org/10.1103/PhysRevPhysEducRes.16.020125 Google Scholar
  • Sotola, L. K., & Crede, M. (2020). Regarding class quizzes: A meta-analytic synthesis of studies on the relationship between frequent low-stakes testing and class performance. Educational Psychology Review, 33, 407–426. https://doi.org/10.1007/s10648-020-09563-9 Google Scholar
  • Stanger-Hall, K. F., Shockley, F. W., & Wilson, R. E. (2011). Teaching students how to study: A workshop on information processing and self-testing helps students learn. CBE—Life Sciences Education, 10(2), 187–198. https://doi.org/10.1187/cbe.10-11-0142 LinkGoogle Scholar
  • Steele, C. M., & Aronson, J. (1995). Stereotype threat and the intellectual test performance of African Americans. Journal of Personality and Social Psychology, 69(5), 797–811. MedlineGoogle Scholar
  • Supriya, K., Mead, C., Anbar, A. D., Caulkins, J. L., Collins, J. P., Cooper, K. M., ... & Brownell, S. E. (2021). Undergraduate biology students received higher grades during COVID-19 but perceived negative effects on learning. Frontiers in Education, 6, 428. https://doi.org/10.3389/feduc.2021.759624 Google Scholar
  • Theobald, E. (2018). Students are rarely independent: When, why, and how to use random effects in discipline-based education research. CBE—Life Sciences Education, 17(3), rm2. https://doi.org/10.1187/cbe.17-12-0280 LinkGoogle Scholar
  • von der Embse, N., Jester, D., Roy, D., & Post, J. (2018). Test anxiety effects, predictors, and correlates: A 30-year meta-analytic review. Journal of Affective Disorders, 227, 483–493. https://doi.org/10.1016/j.jad.2017.11.048 MedlineGoogle Scholar
  • Walck-Shannon, E. M., Cahill, M. J., McDaniel, M. A., & Frey, R. F. (2019). Participation in voluntary re-quizzing is predictive of increased performance on cumulative assessments in introductory biology. CBE—Life Sciences Education, 18(2), ar15. https://doi.org/10.1187/cbe.18-08-0163 LinkGoogle Scholar
  • Ward, J. (2016). The fire this time: A new generation speaks about race. Riverside, CA: Scribner Book Company. Google Scholar
  • Weiner, B. (1985). An attributional theory of achievement motivation and emotion. Psychological Review, 92(4), 548–573. https://doi.org/10.1037/0033-295X.92.4.548 MedlineGoogle Scholar
  • Weston, T. J., Seymour, E., Koch, A. K., & Drake, B. M. (2019). Weed-out classes and their consequences. In Seymour, E.Hunter, A.-B. (Eds.), Talking about Leaving Revisited: Persistence, Relocation, and Loss in Undergraduate STEM Education (pp. 197–243). Cham, Switzerland: Springer International Publishing Google Scholar
  • Wigfield, A., & Eccles, J. S. (2000). Expectancy-value theory of achievement motivation. Contemporary Educational Psychology, 25(1), 68–81. https://doi.org/10.1006/ceps.1999.1015 MedlineGoogle Scholar
  • Wright, C. D., Eddy, S. L., Wenderoth, M. P., Abshire, E., Blankenbiller, M., & Brownell, S. E. (2016). Cognitive difficulty and format of exams predicts gender and socioeconomic gaps in exam performance of students in introductory biology courses. CBE—Life Sciences Education, 15(2), ar23. https://doi.org/10.1187/cbe.15-12-0246 LinkGoogle Scholar
  • Wright, C., Huang, A., Cooper, K., & Brownell, S. (2018). Exploring differences in decisions about exams among instructors of the same introductory biology course. International Journal for the Scholarship of Teaching and Learning, 12(2). https://doi.org/10.20429/ijsotl.2018.120214 Google Scholar
  • Yuretich, R. F., Khan, S. A., Leckie, R. M., & Clement, J. J. (2001). Active-learning methods to improve student performance and scientific interest in a large introductory oceanography course. Journal of Geoscience Education, 49(2), 111–119. https://doi.org/10.5408/1089-9995-49.2.111 Google Scholar