Optional Exam Retakes Reduce Anxiety but may Exacerbate Score Disparities Between Students with Different Social Identities
Abstract
Use of high-stakes exams in a course has been associated with gender, racial, and socioeconomic inequities. We investigated whether offering students the opportunity to retake an exam makes high-stakes exams more equitable. Following the control value theory of achievement emotions, we hypothesized that exam retakes would increase students’ perceived control over their performance and decrease the value of a single exam attempt, thereby maximizing exam performance. We collected data on exam scores and experiences with retakes from three large introductory biology courses and assessed the effect of optional exam retakes on gender, racial/ethnic, and socioeconomic disparities in exam scores. We found that Black/African American students and those who worked more than 20 h a week were less likely to retake exams. While exam retakes significantly improved student scores, they slightly increased racial/ethnic and socioeconomic disparities in scores partly because of these differences in participation rates. Most students reported that retake opportunities reduced their anxiety on the initial exam attempt. Together our results suggest that optional exam retakes could be a useful tool to improve student performance and reduce anxiety associated with high-stakes exams. However, barriers to participation must be examined and reduced for retakes to reduce disparities in scores.
INTRODUCTION
High attrition of students from undergraduate STEM (Science, Engineering, Technology, and Math) majors, especially among women, Black, Latinx, and Native American students, remains a major issue in STEM education in the United States (National Science Foundation and National Center for Science and Engineering Statistics, 2019; Seymour and Hunter, 2019; Asai, 2020). One major reason that students switch out of STEM majors is receiving a poor grade in an introductory STEM course (Seymour and Hewitt, 1997; Astorne-Figari and Speer, 2019; Weston et al., 2019). While many factors, including inequitable teaching practices, noninclusive learning environments, large class sizes, and students’ prior academic preparation, affect student grades in introductory courses (Seymour and Hewitt, 1997; Eagan et al., 2011; Eddy et al., 2014; Freeman et al., 2014; Odom et al., 2021), summative assessments in a course have been shown to play a critical role (Eddy and Hogan, 2014; Cotner and Ballen, 2017; Seymour and Hunter, 2019; Odom et al., 2021).
High-stakes exams (e.g., midterms and finals) often constitute a large portion of student grades in introductory STEM courses (Franklin and Theall, 1992; Kost et al., 2009) and inequities have been documented in these high-stakes summative assessments (Eddy et al., 2014; Wright et al., 2016). Studies have reported larger gender disparities in high-stakes exam scores with men scoring higher than women compared with other forms of assessments, such as lab reports and presentations, where women tend to do better (Kost et al., 2009; Miyake et al., 2010; Ballen et al., 2017, 2018; Matz et al., 2017; Salehi et al., 2019). Similar differences may be present along other dimensions of social identity such as race/ethnicity and socioeconomic status (Richardson, 2015; Wright et al., 2016; Simmons and Heckler, 2020). A reduction in the contribution of high-stakes exams toward total grade (Cotner and Ballen, 2017) or using more frequent low-stakes assessments (Eddy and Hogan, 2014) have been shown to reduce disparities in student course grades by social identities.
However, high-stakes exams, especially those with multiple-choice questions, are often much easier for instructors to administer to large numbers of students (Momsen et al., 2010; Wright et al., 2018). Moreover, some instructors argue that high-stakes exams offer practice for standardized tests such as the MCAT1 that students might take later on (Marsh et al., 2007). Given the tradition of high-stakes exams in college STEM courses, the relative ease of administering high-stakes exams with restricted responses, and the enduring presence of high-stakes exams even during major disruptive events such as COVID-19 (Clark et al., 2020; Gin et al., 2021; Supriya et al., 2021), widespread change in the use of these summative assessments is unlikely. Thus, an alternative strategy is to identify ways in which instructors can continue using high-stakes exams in their courses while still reducing disparities in exam scores by social identities. In this study, we examine whether optional exam retakes offer one possible solution.
Theoretical Framework: Control Value Theory of Achievement Emotions
Control value theory builds on and integrates several theories used to explain achievement emotions in academic settings including expectancy-value theory of achievement motivation (Eccles et al., 1983; Wigfield and Eccles, 2000), attributional theory of achievement emotions (Weiner, 1985), and perceived control theories (Perry, 1991; Patrick et al., 1993). According to the control value theory, two kinds of appraisals shape achievement emotions, which are feelings regarding activities or outcomes linked to student success: 1) Students’ perceived control over the activities and outcomes, for example, expectations that studying will lead to good exam performance, and 2) The value placed on achievement activities and outcomes by students, for example, the importance of good grades on an exam on future career (Pekrun, 2006). Perceived control includes both expectancies, for example, expected outcome of current efforts on an upcoming test (derived from the expectancy-value theory) and retrospective causal attributions, for example, the causes attributed to success or failure on a recent test (derived from the attributional theory of achievement emotions) (Pekrun, 2006). Following this framework, students feel anxiety regarding a test when they experience uncertainty about action-control and action-outcome expectancies, which are uncertainties regarding whether they can perform an action (also termed as “self-efficacy”) and whether their actions will result in a positive outcome or prevent a negative outcome (Bandura, 1977; Pekrun, 1988; Ringeisen et al., 2016; Roick and Ringeisen, 2017). In addition, students feel test anxiety when they value an exam, either due to the intrinsic value (importance of the content to them) or extrinsic value (the importance of their exam performance for other goals such as their career plans) (Pekrun, 2006). Those who have low value expectancies (e.g., they place low value on the exam) might experience negative emotions such as boredom irrespective of control expectancies (i.e., confidence that studying will result in a positive outcome) (Pekrun, 2006). On the other hand, students who have high value and control expectancies, meaning they value their performance on an exam and feel confident that studying will result in a positive outcome, will feel positive achievement emotions such as relief or hope (Pekrun, 2006).
Control value theory also posits that activating positive achievement emotions such as pride and enjoyment caused by high control and value appraisals are associated with better performance on exams (Pekrun, 2006; Putwain et al., 2021). Additionally, negative deactivating achievement emotions such as boredom and helplessness are associated with worse exam performance. Negative activating emotions such as anxiety are theorized to have an ambivalent effect (Pekrun, 2006), but often can impact exam performance negatively (Hembree, 1988; Pekrun et al., 2011).
The control and value appraisals that shape students’ achievement emotions are in turn influenced by a wide range of factors such as students’ personality traits, design of instructional environment, and the social environment experienced by students (Pekrun, 2006). One example of an instructional practice that would affect students’ control and value appraisals is performance goal structures such as norm-referenced grading (i.e., curving). Competitive performance goal structures could reduce students’ perceived control over their test score and increase negative achievement emotions compared with mastery goal structures (e.g., goals focused on achieving proficiency) that could enhance positive achievement emotions (Pekrun, 2006; Lau and Nie, 2008; Furner and Gonzalez-DeHass, 2011). One mechanism through which social environment can affect students’ control and value appraisals is the presence of stereotypes associated with lower academic achievement. For example, Bieg et al. (2015) found that higher generalized math anxiety over time among girls compared with boys is associated with higher math-related gender stereotype endorsement. Greater endorsement of such gender stereotypes about math (i.e., “boys are better at math”) were found to be associated with higher levels of math anxieties among women elementary teachers (Beilock et al., 2010). More broadly, stereotype threat, which is defined as “the risk of confirming, as self-characteristic, a negative stereotype about one’s group” (Steele and Aronson, 1995), could increase students’ value appraisal and decrease their control appraisal resulting in negative achievement emotions such as anxiety.
Systematic differences in achievement emotions such as test anxiety between students with different social identities might be caused by the social environment and lead to differences in student performance. Several studies have reported demographic differences in average levels of test anxiety and strong negative correlations between self-esteem and self-efficacy and test anxiety (Hembree, 1988; von der Embse et al., 2018). These differences in achievement emotions in turn could lead to disparities in exam scores received by students with different social identities. A study of undergraduate students at a Canadian University found that 46.3% of women suffered from self-reported test anxiety at some point over the course of their university career compared with 30% of men (Gerwing et al., 2015). Other studies have shown that test anxiety has a negative effect on exam performance among women in introductory science courses (Ballen et al., 2017; Salehi et al., 2019). Higher test anxiety among women results in larger gender disparities in student exam scores on exams compared with nonexam assessments in introductory college science (Ballen et al., 2017; Salehi et al., 2019).
Drawing upon the control value theory, we hypothesized that optional exam retakes might change students’ control and value appraisals and thereby reduce negative emotions such as anxiety, leading to smaller disparities in exam scores by social identities. Having the opportunity to retake an exam may help a student feel more control over their exam performance and may also reduce the value associated with their performance on the first attempt. It might also reduce anxiety and increase sense of control because of increased familiarity with the test format and question structure. Both of these would lead to greater likelihood of a student experiencing positive emotions such as relief or hope and decrease the likelihood of a student experiencing negative emotions such as anxiety or hopelessness. These effects on students’ control and value appraisals should in theory lead to better student exam scores. Moreover, if students with certain social identities have lower perceived control or place higher value over their performance due to factors such as stereotype threat, increasing students’ control appraisals and decreasing value appraisals might benefit such students disproportionately.
Optional Exam Retakes Could Benefit Student Learning Through Testing Effect and Promoting Mastery Orientation
In addition to increasing positive achievement emotions and reducing negative achievement emotions as described in the section above, optional exam retakes could benefit student learning through “testing effect” and promoting a mastery orientation among students. Testing effect is defined as the positive effect of taking a test on long-term retention, as indicated by better performance on a test at a later timepoint among students who were tested on given material soon after reading versus students who simply reread the material and were not tested soon after (Roediger and Butler, 2011). There is a large amount of evidence on test-enhanced learning in various contexts (Roediger and Karpicke, 2006; Roediger and Butler, 2011; Rowland, 2014; Adesope et al., 2017). There is also some evidence for test-enhanced learning from undergraduate biology classrooms (Dobson, 2008; Stanger-Hall et al., 2011; McDaniel et al., 2012; Orr and Foster, 2013). Some studies show that this effect is stronger when students receive feedback on their initial test before retesting, and when the initial test is “retrieval-based,” that is, requires students to generate the answer instead of simply recognizing the answer (Rawson and Dunlosky, 2012; Rowland, 2014; Brame and Biel, 2015). It is important to note, however, that most of the evidence for test-enhanced learning comes from studies where the initial test was associated with no or low stakes (Brame and Biel, 2015). Here, we examined whether the testing effect extends to scenarios where the initial test is high stakes by assessing the impact of retakes on student exam scores.
Another benefit of offering optional exam retakes, especially when students are allowed to retake until they have demonstrated their learning, is that it can promote a mastery orientation. This can be helpful for long-term student learning, especially for courses where student understanding of one part of course content is essential to their ability to learn subsequent course content (Juhler et al., 1998). Prior studies where students were offered opportunities to retake exams have shown that students score better on retakes compared with first attempts (Fichten and Adler, 1977; Cates, 1982; Davidson et al., 1984; Friedman, 1987; Juhler et al., 1998; Abraham, 2000; Badawy et al., 2016; Herman et al., 2020) and students who participate in a greater number of retakes do better on a cumulative final (Friedman, 1987; Abraham, 2000; Walck-Shannon et al., 2019). Moreover, studies have reported that students liked the use of exam retakes and reported lower anxiety about the exams (Fichten and Adler, 1977; Cates, 1982; Davidson et al., 1984; Friedman, 1987; Abraham, 2000). This prompted our interest in examining student motivation to participate in exam retakes and their experience with taking exams.
In this study, we examined the effect of optional exam retakes on student exam scores and asked whether optional exam retakes could reduce disparities in exam scores received by students with different social identities. Any advantages of an optional intervention such as this are contingent upon equitable participation. Therefore, we also assessed whether there were any demographic differences in student participation in optional exam retakes. Finally, we used surveys to understand student participation in and experiences with optional exam retakes, including whether having an opportunity to retake an exam helped students stay calmer on the first exam attempt.
Research Questions and Predictions
To what extent are there gender, socioeconomic, and racial/ethnic differences in participation in optional exam retakes? If there are differences, are these associated with differences in the reasons chosen for participating or not participating in exam retakes?
What is the effect of retakes on student exam scores in the course?
What is the effect of retakes on gender, socioeconomic, and racial/ethnic disparities in exam scores? To what extent is this associated with differences in participation in optional exam retakes?
Are there demographic differences in student experiences with retakes, especially with respect to the effect of retakes on student anxiety while taking the initial exam?
We predicted that likelihood of retaking an exam would be associated with students’ scores on the initial exam attempt as reported in prior studies (Friedman, 1987; Juhler et al., 1998), but whether a student would retake an exam may also be affected by factors that reduce the amount of time available to students, such as the number of hours they work a job during the semester. We expected to see a positive impact of retakes on student exam scores and a reduction in gender, socioeconomic and racial/ethnic disparities. Finally, we predicted that optional exam retakes would lower student anxiety while taking the initial exam and did not expect any demographic differences in student experiences with retakes.
Researcher Positionality
We recognize that researchers’ identities and positions shape the research they do and introduce various implicit biases. An important way to counter such biases is to be explicit about researcher identities and positions. Our research team consists of discipline-based education researchers (K.S., S.B., C.W., and K.V.) and instructors for the courses included in this study (C.W., J.E., D.T., C.P., and C.B.). Our team’s social identities include women (K.S, J.E., K.V., and S.B.), men (C.W., D.T., C.P., and C.B.), South Asian (K.S.), Hispanic (K.V.), white (S.B., C.W., J.E., D.T., C.P., and C.B.), first generation to college (D.T. and J.E.), and continuing generation (K.S., C.B., C.P., K.V., C.W., and S.B.). Moreover, some members of the team received Pell grants during their undergraduate degree and/or worked a job for at least 10 h/wk during college (C.W. and J.E.). Two members of the team (K.S. and C.B.) earned their undergraduate degree outside of the United States and are first-generation immigrants to the United States.
MATERIALS AND METHODS
This study was conducted at a large research-intensive public university in Southwestern United States over three courses that are part of the introductory biology sequence for biology majors between Spring 2019-Spring 2020. About one-third of undergraduate students at this university receive federal Pell grants, a little more than 20% are first-generation college students and about 80% are between the ages of 18–22. First-year students are expected to live on-campus. Students were offered an opportunity to retake exams in all three courses with some small differences as described below and summarized in Table 1. Each course had three major high-stakes timed exams, although exam contribution to the final grade differed across these courses. Questions on the retakes were isomorphic to the questions on the original exam. Students received their score on their first attempt and were given access to the questions with correct answers to review on their own and/or during a review session before the retake took place. Retakes happened within 2 wk from the original exam. Students were only eligible to participate in the retake exam if they took the original exam. For students who retook exams, only the highest score they received contributed toward their final grade, regardless of whether it was the first attempt or the retake.
Course 1 | Course 2 | Course 3 | |
---|---|---|---|
When was this course taught? | Fall 2019 | Spring 2020 | Spring 2019 |
Number of retake opportunities | 1 per exam | 1 per exam | 2 per exam |
When were retakes held? | Outside of class time | In class or online | Outside of class time |
% course grade that exams contributed towards | 39% | 48% | 75% |
In all three courses, exams tended to focus on higher order Bloom’s questions (Crowe et al., 2008) and retakes contained isomorphic but not identical questions. Average student performance from previous iterations of this course showed that these questions tend to be challenging for students. Therefore, the impetus for implementing exam retakes was to help incentivize students to prepare more for exams while promoting higher order learning.
Course 1: In this course, exams constituted 39% of the course grade. Students were offered one retake opportunity for each exam and the retakes were scheduled outside of official class time. While classes for this course were either on Mondays and Wednesdays or Tuesdays and Thursdays, the retakes were scheduled for Fridays. Students could pick from time slots throughout the day on Fridays to take the retake, but had to book the time slot in advance. This course was taught by C.W., J.E., D.T., and C.B.
Course 2: In this course, exams constituted 48% of the course grade. Students were offered one retake opportunity for each exam and the retakes were scheduled during official class time. However, this course occurred during the Spring 2020 semester when classes transitioned to emergency remote learning due to the COVID-19 pandemic. Thus, only the first exam took place during the in-person period of the course and the other two exams were administered online. These exams were proctored using Respondus LockDown Browser and Monitoring; videos were reviewed by instructors and TAs following the exam, and students were given a 33-h window to complete exam 2 and exam 2 retake each, and a 72-h window for exam 3 and exam 3 retake each. This course was taught by C.W., J.E., and C.B.
Course 3: In this course, exams constituted 75% of the course grade. Students were offered two retake opportunities for each exam and the retakes were scheduled outside of class. The first retake took place 2 wk after the first attempt for exams 1 and 2, and 1 wk after the first attempt for exam 3. The second retake was offered 1 wk after the first retake. Students received their score on the first retake before the second retake occurred. Retake scheduling and administration was outside of official class time similar to that for Course 1. See Table 1 for a summary of the differences among the courses. This course was taught by C.W. and C.P.
Data Collection
All protocols for this study were approved by ASU’s Institutional Review Board (IRB) protocol no. 10528 and no. 1634.
We collected gradebooks from the introductory biology instructors who consented to participate in our study. For Courses 1 and 2, student consent for participation in the study was requested via an online survey administered using the Qualtrics platform. We filtered all course gradebooks to remove data from students who did not consent to participate in the study. For Course 3, we used archival data from an older iteration of the course so we did not request consent from students.
We requested student demographic data for all students who consented to participate in the study in Courses 1 and 2, and all students in Course 3 from the registrar’s office. This institutional data did not allow students to pick a gender outside of the man and woman binary. We acknowledge that this might be an inaccurate representation of the gender of students in our study who might not conform to the gender binary. Much of institutional demographic data come from college applications such as the common app which still does not provide an option beyond the binary, although it has added a multiselect pronoun question recently (Jaschik, 2021). We also used institutional data for race/ethnicity which asked students two separate questions, one asking whether they identify as Hispanic/Latinx and another asking them to pick the race(s) with which they identify. It is important to note that race is a social construct that was invented to classify people into a social hierarchy (Roberts, 2011), and due to systemic racism, racial identity continues to have a significant impact on people’s lives (Alexander, 2012; Ward, 2016). We used gender, race/ethnicity, and proxies of socioeconomic status in this study to assess systemic inequities and ascribe any differences in outcomes we report to systems of oppression such as sexism, racism, and classism, and not any social identity per se.
For Courses 1 and 2, we asked a few additional demographic questions such as the number of hours per week that students worked a job along with the survey that included the consent form for the study.
We collected grades and demographic data from 792 students in Course 1, 635 students in Course 2, and 439 students in Course 3. The full demographic breakdown of students from whom we collected grades and demographic data is included in Table 2 and survey responses are included in Supplemental Table S1. As mentioned above, we were only able to include women and men in our analyses. Students who picked more than one race were categorized as multiracial for our analyses. Any students who identified as Hispanic/Latinx in the first question were categorized as Hispanic/Latinx for the rest of our analyses. We removed Native American/American Indian students and Native Hawaiian/Pacific Islander students due to small sample size. We also removed students whose race/ethnicity was unspecified in the registrar data and those who chose “decline to state” to the survey question about work hours per week in Courses 1 and 2. After removing these students, we had data from 749 students in Course 1, 614 students in Course 2, and 429 students in Course 3 that we used for analyses.
Course 1 | Course 2 | Course 3 | ||||
---|---|---|---|---|---|---|
Number in total sample | Number in sample analyzed | Number in total sample | Number in sample analyzed | Number in total sample | Number in sample analyzed | |
N = 792 | N = 749 | N = 635 | N = 565 | N = 439 | N = 429 | |
Gender | ||||||
Women | 540 (68%) | 512 (68%) | 438 (69%) | 387 (68%) | 302 (69%) | 295 (69%) |
Men | 252 (32%) | 237 (32%) | 197 (31%) | 178 (32%) | 137 (31%) | 134 (31%) |
Race/Ethnicity | ||||||
American Indian/Alaska Native | 9 (1.1%) | 5 (0.8%) | 2 (0.5%) | |||
Asian American | 108 (14%) | 105 (14%) | 100 (16%) | 92 (16%) | 80 (18%) | 80 (19%) |
Black/African American | 39 (4.9%) | 39 (5.2%) | 34 (5.4%) | 33 (5.8%) | 22 (5.0%) | 22 (5.1%) |
Hispanic/Latinx | 193 (24%) | 188 (25%) | 139 (22%) | 121 (21%) | 89 (20%) | 89 (21%) |
International | 26 (3.3%) | 26 (3.5%) | 24 (3.8%) | 23 (4.1%) | 8 (1.8%) | |
Multiracial | 45 (5.7%) | 44 (5.9%) | 36 (5.7%) | 32 (5.7%) | 28 (6.4%) | 28 (6.5%) |
Native Hawaiian | 2 (0.3%) | 1 (0.2%) | ||||
White | 353 (45%) | 347 (46%) | 281 (44%) | 264 (47%) | 210 (48%) | 210 (49%) |
Unspecified | 17 (2.1%) | 15 (2.4%) | ||||
Pell-eligible | 264 (33%) | 250 (33%) | 204 (32%) | 176 (31%) | 138 (32%) | |
Number of hours worked at a job | ||||||
No | 444 (56%) | 424 (57%) | 393 (62%) | 381 (67%) | ||
≤20 h | 242 (31%) | 236 (32%) | 111 (17%) | 105 (19%) | ||
>20 h | 90 (11%) | 89 (12%) | 80 (13%) | 79 (14%) | ||
Decline to state | 16 (2.0%) | 51 (8%) |
In addition to the exam and demographic data, we sent short surveys to all students in Courses 1 and 2 after each exam retake opportunity to understand the reasons that students had for choosing to retake or not retake an exam. Because there is minimal published literature on student participation in optional exam retakes, we developed the survey based on our own experiences as students and instructors in undergraduate biology classrooms. We asked students an open-ended question about their decision to participate in the exam retake before asking a closed-ended question where they could select multiple options for choosing to retake or not retake an exam. Next, we asked students whether they think the retakes helped their learning and reduced their anxiety about exams in the course. Finally, we included some questions about student experiences with exam retakes such as whether they like retakes and whether they found it too time consuming to prepare for and retake exams (see supplemental materials for a copy of all survey questions). To check for cognitive validity of survey items, we conducted six think-aloud interviews with undergraduate students and iteratively revised survey items until no further changes were needed (Beatty and Willis, 2007).
The surveys were distributed to all the students enrolled in Courses 1 and 2 via an email from the instructors and an announcement on the learning management system. Students were offered a small amount of extra credit for completing the survey and were explicitly told that their instructors would not see their responses to the survey. In Course 1, out of 1090 students, 457, 592, and 335 students filled out our survey after exam 1, 2, and 3, respectively, resulting in a response rate of 42%, 54%, and 31%, respectively. In Course 2, out of 846 students, 501, 431, and 428 students filled out our survey after exam 1, 2, and 3 respectively, resulting in a response rate of 59%, 51%, and 51%, respectively. The full set of analyzed survey questions is included in the supplement.
Data Analysis.
Because courses differed in important ways from each other, such as the number and timing of retakes, we analyzed data for each course separately.
Analysis of Exam Data.
We used logistic regressions to assess which variables influence the likelihood of a student retaking an exam. The outcome for these models was a two-column matrix indicating the number of exams a student retook and the number of exams they did not retake, and the predictor variables included total exam score on first attempts, gender, race/ethnicity, and Pell grant eligibility (a proxy for socioeconomic status). In addition, for Courses 1 and 2, we included the number of hours per week that students worked a job during the semester as a predictor. Because Course 3 data were archival in nature, we did not have information on this variable for students in that course.
We used paired t tests to assess whether exam retakes significantly improved student scores overall. Next, to examine the effect of optional exam retakes on students with different social identities, we compared three regression models for each course.
Model 1: In this model, total exam score received on first attempts was the outcome variable and gender, race/ethnicity, Pell-eligibility, and number of hours worked per week (only for Courses 1 and 2) were the predictors. This model allowed us to assess demographic disparities in exam scores between students with different social identities prior to exam retakes. Knowing that they had an opportunity to retake exams might have affected student preparation and behavior for the first exam, so these disparities might not be the same as what we would observe in a course without retakes. However, they still give us useful information on the magnitude of score disparities in these courses.
Model 2: In this model, total exam score received after taking retakes into account was the outcome variable and total exam score on first attempts along with demographic variables were the predictors. This model shows us the effect of exam retakes on disparities in the exam score received by students with different social identities. If exam retakes benefit all students equally, we would expect that none of the demographic variables would have a significant effect in this model.
Model 3: This model was the same as Model 2 but with the addition of the total number of retakes taken by a student added as a predictor. Comparing Model 2 with Model 3 allows us to understand the extent to which differences in participation in exam retakes, if any, shape the effects of exam retakes on students.
The reference groups for all of our models were: women, white students, students who were not eligible for federal Pell grants, and students who did not work a job during the semester.
We checked adequacy of all regression model fits using the check_model function in the R package performance (Lüdecke et al., 2020). We examined the plots for fitted values against residuals for linear regressions to check for linearity and fitted values against the square root of standardized residuals for both linear and logistic regressions to check for homogeneity of variance. We found that the lines were relatively flat and horizontal for all models for Courses 1 and 2, indicating that these assumptions were met. There was a slight curvature in the lines for models for Course 3, suggesting that the assumption for homogeneity of variance might not be met, so we encourage caution in interpreting our results from Course 3. There were no outliers in any of the models and the residuals were normally distributed. We checked that the variance inflation factors were low implying that the predictors were not collinear.
Analysis of Survey Data.
Within each course, we pooled data from all three surveys together. To assess whether there were demographic differences among students for the reasons for retaking or not retaking an exam, we coded each reason provided as an option on our survey as a binary variable. We also converted survey questions about student experiences with retakes that were originally on a Likert scale into a binary response. We then used these survey responses as the outcome and demographic variables as predictors. Because a student might have responded to more than one survey, we used a generalized linear mixed model approach for these analyses and treated the unique random ID assigned to each student as a random effect. Intraclass coefficient values were greater than 0.12 for all these models (with one exception where it was 0.077), indicating that there was appreciable within-student clustering of responses which justified the use of random effects in the models (Theobald, 2018). We used the R package lme4 (Bates et al., 2015) for analysis and the package performance for calculation of intraclass coefficients and checking model fits visually (Lüdecke et al., 2020). Some of these models with the survey data did not meet all the assumptions of generalized linear mixed models. As such, we view the survey data and accompanying analyses as exploratory.
RESULTS
Finding 1: There are Demographic Differences in Participation in Optional Exam Retakes
In all three courses, students who scored higher on first attempts were less likely to retake exams compared with reference groups. Generally, student participation in retakes was higher in Course 2 where retakes were offered in class or online compared with Course 1 where retakes were offered outside of class time. There was a steep drop-off between retake rates for exam 1 compared with exams 2 and 3 in Course 2 which happened after the transition to remote learning due to COVID-19. In Course 3, where retakes were offered outside of class time but constituted 75% of course grade, participation in retakes ranged from 60–75% (Figure 1).
Asian American students, International students, and Pell-eligible students were more likely to retake exams than reference groups in Course 1 (Figure 2). However, students who worked a job and Hispanic/Latinx students were less likely to retake exams. Survey analyses showed that Asian American students and Pell-eligible students who retook exams were more likely to choose not being satisfied with their score in this course as the reason for retaking the exam (Table 3). This was true for Asian American students even after controlling for the score they received on the first attempt (Supplemental Table S2). The most common reason students chose for not retaking an exam was that they were satisfied with their score on the first attempt. If a student was not satisfied with their score, but still chose to not retake an exam, it might be because they experienced some barriers to retaking exams. Among students who did not retake exams, Pell-eligible students were less likely to choose being satisfied with their score in the exam as a reason for not retaking the exam (Table 4). Students who worked 1–20 h/wk were also less likely to choose this option even after controlling for the score they received on the first attempt, suggesting that their lower participation in retakes might be due to barriers to participation (Supplemental Table S3). Having retakes scheduled outside of class time could be a barrier to participation for students who work a job, have a heavy course load, or have other responsibilities.
Course 1 | Summary of demographic differences | Course 2 | Summary of demographic differences | |
---|---|---|---|---|
% (n = 880) | % (n = 1033) | |||
I was not satisfied with my score | 77.16 | Asian American students and Pell-eligible students more likely to pick | 86.66 | Latinx and students who work 1-20 h/wk more likely to pick |
I thought I could improve my score | 79.77 | Men and International students less likely to pick | 73.22 | Men, International students, and Pell-eligible students less likely to pick |
First attempt was encouraging | 19.89 | Black students and Pell-eligible students less likely to pick; men more likely to pick | 7.32 | No demographic differences |
To help learn the material better | 44.43 | No demographic differences | 48.24 | Men less likely to pick |
To practice my test-taking skills | 22.84 | No demographic differences | 20.16 | No demographic differences |
Because my friends were retaking | 5.45 | – | 4.41 | – |
To impress my instructor | 1.93 | – | 2.51 | – |
Course 1 | Notes | Course 2 | Notes | |
---|---|---|---|---|
% (n = 447) | % (n = 261) | |||
I was satisfied with my score | 75.17 | Pell-eligible students pick less often | 58.54 | No demographic differences |
I was discouraged by my score | 4.01 | – | 9.35 | – |
I didn’t think I could improve my score | 34.9 | Men less likely to pick | 28.05 | Latinx students less likely to pick |
Taking exams makes me anxious | 12.3 | No demographic differences | 13.41 | No demographic differences |
Too difficult to come to campus for retake | 13.87 | No demographic differences | n/a | |
Signing up process was difficult | 2.88 | – | n/a | |
Scheduling conflicts | 26.52 | No demographic differences | n/a | |
Too busy | 19.24 | No demographic differences | 33.74 | Pell-eligible students were less likely to pick |
Didn't feel like it | 10.74 | Students who work 1-20 h/wk less likely to pick | 10.16 | Pell-eligible students were more likely to pick |
Planned to retake, but forgot | 5.82 | – | n/a |
In Course 2, where retakes were scheduled during class time or administered online, students who worked were still less likely to retake exams in this course (Figure 2). Moreover, Black/African American students were significantly less likely to retake exams in this course. Neither students who worked nor Black/African American students were more likely to pick any of the reasons listed in the survey for not participating in exam retakes (Table 4).
In Course 3, Asian American students were more likely to retake exams and men were less likely to retake exams.
Finding 2: On Average, Students Score Higher on the Retakes Compared with First Attempts
Overall, students scored higher on retakes compared with the first attempts (paired t tests, all P < 0.05, except for exam 1, Course 1) (Figure 3). The average difference in total exam score on first attempts and total exam score students received in the course after taking retakes into account was 17.6 points out of 390 (4.5 percentage points) in Course 1, 26.4 points out of 360 (7.3 percentage points) in Course 2, and 38.1 points out of 300 (12.7 percentage points) in Course 3. Thus, optional exams increased student score significantly in all three courses.
Finding 3: Optional Exam Retakes Increased Score Disparities Between Students with Different Social Identities, Likely Due to Differences in Participation in Retake
Pell-eligible students received lower total scores on first exam attempts compared with students who were not eligible for Pell grants in all three courses. Black and Latinx students received lower total scores on first exam attempts compared with White students, although this was statistically significant for Black students only in Courses 1 and 2, and for Latinx students only in Courses 2 and 3 (Figure 4, A–C). International students received lower scores on first attempts than white students in Course 1. We only had data about the number of hours students work a job for Courses 1 and 2 and in both we saw that these students, especially those working more than 20 h a week, received a lower score than students who did not work. In Course 2, students who worked less than 20 h a week also received a significantly lower score than students who did not work a job (Figure 4, A and B).
Optional exam retakes did not reduce these demographic disparities in exam scores, but instead exacerbated some of the inequities in student exam scores. For example, the regression coefficient estimate for Black/African American students in Model 2 for both Courses 1 and 3 was negative, indicating that the disparity in total exam scores between Black and white students increased after optional exam retakes (Figure 4, D and F). Course 2 results showed that retakes also increased the score disparity between students who worked a job during the semester and students who did not (Figure 4E). Differences in participation in exam retakes explained these results to a large extent as seen in Model 3, which includes student participation in retakes as a predictor (Figure 4, G–I). Most of the significant differences that we saw in Model 2 (Figure 4, D and F) were no longer significant once we took student participation in retakes into account (Figure 4, G–I). However, even after accounting for student participation in retakes, there were some differences in scores that remained (Figure 4G).
Finding 4: Students Prefer Optional Exam Retakes and Having a Retake Opportunity Reduces Anxiety on the Initial Test Similarly for Students with All Social Identities, but Some Issues Remain
Overall, exam retakes were very popular with students and most students, regardless of whether they retook an exam, perceived that exam retakes help their learning. Moreover, about 86% of students who retook the exam said that they put a lot of effort into studying for the retake, while only 20–30% of students said that they studied less for the initial exam.
Most students who retook an exam agreed that retakes reduced their anxiety about taking tests. However, 49–56% of students indicated that they felt anxious while retaking the exams. This was more often true for Asian American students in both Course 1 and Course 2, and students who worked more than 20 h in Course 1. While most students agreed that having the opportunity to retake exams helped them stay calmer on the initial exam, Latinx students were less likely to agree with this in Course 1. Moreover, about a quarter of students indicated that they did not retake the exam because of test anxiety. Thus, it seems that exams remain a high-stakes situation for many students, even when optional exam retakes are offered (Table 5).
Course 1 | Notes | Course 2 | Notes | |
---|---|---|---|---|
Students who retook exams | % (n = 880) | % (n = 1033) | ||
Retaking this exam helped my learning | 96.7 | – | 94.68 | – |
Retaking reduced my anxiety about taking tests | 81.48 | Men more likely to agree | 83.85 | No demographic differences |
Put a lot of effort into studying for retake | 86.36 | No demographic differences | 85.86 | International students less likely to agree |
Anxious while retaking | 48.98 | Asian American students and students who work >20 h/wk more likely to agree; men less likely to agree | 56.24 | Asian American students more likely to agree; men less likely to agree |
Preparing and retaking was too time consuming | 22.16 | No demographic differences | 25.78 | No demographic differences |
Finding the time was challenging | 20.22 | Asian American, Latinx & International students and students who work >20 h/w more likely to agree | n/a | n/a |
Students who did not retake exams | % (n = 447) | % (n = 261) | ||
Retaking this exam would not have helped my learning | 23.04 | Men were more likely to agree | 8.89 | No demographic differences |
I did not retake because of test anxiety | 23.49 | No demographic differences | 30.08 | No demographic differences |
Appreciate opportunity to retake exams | 98.88 | – | 97.97 | – |
All students | % (n = 1337) | % (n = 1290) | ||
Lower anxiety on initial exam | 89.3 | No demographic differences | 91.53 | No demographic differences |
Stayed calmer on initial exam | 81.97 | Latinx students less likely to agree | 86.92 | No demographic differences |
Studied less for initial exam | 20.19 | International students and students who did not retake exam less likely to agree | 32.12 | Students who did not retake exam less likely to agree |
Like exam retakes | 99.33 | – | 98.63 | – |
Prefer regular hours | 43.08 | Students who did not retake exam, International students, and students who work 1-20 h/wk more likely to agree | n/a | n/a |
In Course 1, where retakes were scheduled outside of class, having the retakes during class time was preferred by 43% of students, especially those who did not retake an exam, worked a job, and/or were International students. Asian American, Latinx, International students, and students who work more than 20 h a week were more likely to agree that finding the time to retake the exam was challenging in this course. Even when the retakes were scheduled in class/online in Course 2, a little more than 20% of students agreed that preparing and retaking the exam was too time consuming (Table 5).
DISCUSSION
Our results show that optional exam retakes might be a useful tool to improve student performance and learning. Moreover, they might reduce student anxiety on the first exam attempt. However, we observed demographic differences in participation in exam retakes, especially when retakes were offered outside of class time. Specifically, in courses where retakes were outside of class time, Black students and students who worked a job were less likely to retake exams even after controlling for their score on the first exam attempt. Additionally, retakes seem to exacerbate the disparities in scores received by students with different social identities. This was likely explained by differences in participation in exam retakes. Our results highlight the importance of thinking about accessibility in a broad sense for any instructional practice such as, in this instance, offering retakes in a way that ensures all students are able to participate.
There are several factors that might shape students’ decisions to participate in an optional intervention such as the optional exam retakes we studied. First, students might not have time on their hands to participate in an optional intervention. Diversity among college students has been increasing over the past couple of decades. With that, the proportion of students who work and those who have caregiving responsibilities has also increased (Bowen et al., 2009; Goldrick-Rab and Sorensen, 2010; Goldrick-Rab, 2016). Employment rate among 18- to 22-year-old full-time college students increased from 33% in 1970 to 52% in 2000 (Scott-Clayton, 2012) and was at 43% in 2018 (National Center for Education Statistics, 2019). In addition, 22% of all undergraduate students are parents and 42% of parenting students are single mothers (Cruse et al., 2019). According to a recent survey of parenting students, 60% of them were working a job and another 13% were not working but were looking for jobs (Goldrick-Rab et al., 2020). Many other students also might have caregiving responsibilities for other family members such as siblings and grandparents. Overall, many undergraduate students, particularly women, students of color, and socioeconomically disadvantaged students, have significant limits on the amount of time and availability at a given time for coursework. Second, students might have scheduling constraints that might prevent them from participating in an optional intervention that is offered outside of class time with limited day and time options. Lastly, students might not be motivated to participate in optional interventions or might be hesitant for a variety of reasons such as lower interest in the course material, test anxiety, and stereotype threat.
Given that a large proportion of students need to balance taking courses with other responsibilities such as work and caregiving and our findings of discrepancies in who takes optional retakes outside of class, we argue that opportunities to improve grades such as optional exam retakes need to be offered during formal class time. This would eliminate any scheduling barriers. Comparing the participation rates in retakes across the three courses, we found that a much larger proportion of students retook exams in Course 2, when exams were offered during class time or online compared with Courses 1 and 3, where exams were offered outside of class time. Moreover, as our survey results show, students who worked a job were more likely to agree that they would have preferred exam retakes during class time.
Although offering exam retakes during class time can help make optional exam retakes more equitable, it still is not enough to overcome the barriers students with limited time face because effectively retaking exams involves preparing for exams outside of class. Thus, even when retakes were offered during class time in Course 2, some disparities in participation and benefits from retakes persisted. This might be due to the fact that this course was taught during the spring semester of 2020, when the public health crisis caused by the COVID-19 pandemic forced widespread shutdowns and a rapid transition to remote learning. While the pandemic adversely affected nearly all college students, it had disparate impacts on students based on their social positions (Aucejo et al., 2020; Gelles et al., 2020; Supriya et al., 2021). For example, in an interview study of engineering students, Gelles et al. (2020) found that men described having more free time in Spring 2020 while women described having to take on more domestic responsibilities. In a survey study, students of color and, to some extent, lower-income students reported a larger reduction in weekly study hours compared with White students and higher-income students after the transition to remote learning in Spring 2020 (Aucejo et al., 2020). Such differential impacts of the pandemic could explain the differences in participation in retakes among students in this course. However, as described earlier, there are systematic differences in the amount of time that students have for coursework even in the absence of a global pandemic.
Despite these disparities in participation, optional exam retakes benefitted student learning and were very popular with students. Moreover, almost 90% of students said that having the opportunity to retake exams reduced their anxiety on the initial exam. Therefore, finding ways to offer retakes that reduce test anxiety experienced by students and retain the benefits to student learning could be beneficial. In addition to offering retakes during class time as described earlier, one approach could be to make exam retakes mandatory for students who received a low score on the first attempt as done in some mastery learning approaches (Diegelman-Parente, 2011). However, this runs the risk of exacerbating the test anxiety that many students experience while taking high-stakes exams. We noticed that for exam 1 of Course 2 which took place in person prior to the COVID-19 shutdowns, Black students retook exams less often than White students even though retakes were offered during class time (Supplemental Figure S2). A potential explanation for this is that Black students were experiencing stereotype threat in these high-stakes assessment settings due to the fear of confirming negative stereotypes about Black students in STEM (Steele and Aronson, 1995; Aronson et al., 1998). This might cause Black students to avoid high-stakes assessment settings, resulting in the lower participation rates observed in our study.
While we have not seen studies that examine demographic differences in participation and outcomes for high-stakes optional interventions such as the optional exam retakes studied here, there is some work examining the impact of extra credit assignments that are also optional in nature. These studies report that report that students with higher grades are more likely to complete extra credit assignments (Hardy, 2002; Silva and Gross, 2004; Moore, 2005; Harrison et al., 2011). By contrast in our study, higher scoring students were less likely to retake exams than lower scoring students. Higher stakes associated with exams compared with extra credit, and the fact that higher scoring students had less to gain in terms of exam points from retaking might explain this result.
It is also important to note here that while many students reported that having an opportunity to retake the exam reduced their anxiety on the initial exam, about half of the students agreed that they felt anxious while retaking the exam. Further, a quarter of students agreed that they chose to not retake the exam due to test anxiety. There were also some demographic differences in the impact of exam retakes on student anxiety around test taking. Asian American students were more likely to agree that they felt anxious during the retake. This might be due to these students experiencing the pressures of conforming to the “model minority” stereotype (Poon et al., 2016; McGee et al., 2017). Latinx students were less likely to agree that the exam retake opportunity helped them stay calmer on the initial exam. It is possible that Latinx students were also experiencing stereotype threat in the high-stakes assessment settings (Gonzales et al., 2002; Nguyen and Ryan, 2008). These results suggest that the high-stakes settings of these exams are another important barrier to equitable participation in and outcomes of optional exam retakes.
Two additional evidence-based ways to encourage mastery learning are frequent low-stakes assessments and two-stage exams, which we discuss below.
Several studies show that frequent low-stakes assessments lead to more equitable grade distributions among students. A recent meta-analysis showed a positive association between frequent low-stakes assessments and students’ overall course performance and likelihood of passing a course (Sotola and Crede, 2020). Such positive associations have also been observed in undergraduate introductory biology courses. For example, Eddy and Hogan (2014) reported that moderate course structure (i.e., one graded assignment per week) reduced grade disparities between Black students and White students and between first-generation to college and continuing-generation students in an introductory biology course. Cotner and Ballen (2017) reported smaller gender gaps in courses where exams constituted a smaller proportion of the overall course grade in introductory biology courses. Such frequent low-stakes assessments, especially when they take the form of quizzes, offer one way to enhance student learning and retention of course material through the testing effect. In fact, Hinze and Rapp (2014) reported that low-stakes assessments lead to higher long-term retention of science content, but high-stakes assessment did not lead to higher long-term retention. Thus, opportunities for retaking low-stakes quizzes might be more effective for student learning than high-stakes exams.
Another way to encourage mastery learning among students might be to use two-stage exams. Two-stage exam is a method of assessment where students complete an exam individually, then retake the exam with a group of peers (Yuretich et al., 2001). Several studies report benefits of two-stage testing including significantly higher scores on the collaborative assessment compared with independent assessment scores (Bruno et al., 2017; Levy et al., 2018) and higher scores in semesters or topics with collaborative testing as the second stage compared with individual retest (Gilley and Clarkston, 2014; Knierim et al., 2015). Moreover, group scores were found to be higher than the scores of any individual within the group demonstrating collaborative learning in geology and oceanography courses (Bruno et al., 2017). Collaborative testing has also been found to improve content retention within some undergraduate college courses (Gilley and Clarkston, 2014; Knierim et al., 2015; Deng and Luo, 2018) including introductory biology courses (Cooke et al., 2019), although other studies did not find evidence for greater content retention (Leight et al., 2012). Finally, studies show that students find the collaborative testing in two-stage exams more helpful and less stressful than traditional assessments (Yuretich et al., 2001; Leight et al., 2012; Rieger and Heiner, 2014; Levy et al., 2018). Future studies should examine whether two-stage exams benefit students with all social identities equitably and how that compares with optional individual exam retakes.
Ultimately, there might not be one solution or intervention that works equally well for all students. Short interventions such as values affirmation exercises (Harackiewicz et al., 2014; Jordt et al., 2017), and expressive writing exercises (Ramirez and Beilock, 2011; Harris et al., 2019) might help us move toward more equitable learning and assessment (Casad et al., 2018). However, eventually we might need to make more large-scale changes, such as providing students with multiple ways to demonstrate their learning as suggested by Culturally Relevant Pedagogy (Ladson-Billings, 1995) and Universal Design for Learning (Rose et al., 2006) frameworks to achieve equitable learning and assessment in STEM classrooms.
Limitations
This study was done across multiple introductory biology courses at a single institution, so generalizations should be made with caution. Various factors, such as demographics of the institution, social identities of the instructional teams, instructor experience, and ability to create an inclusive classroom, could all influence the impact of an intervention such as optional exam retakes. One of the semesters that we included in this study was the Spring 2020 semester when colleges were forced to transition from in-person to fully online instruction due to the COVID-19 pandemic. The COVID-19 pandemic also affected people’s lives in profound ways and thus patterns of student participation in retakes that semester might also have been affected by the pandemic. Another limitation of our study is that although we asked students about the effect of optional exam retakes on their test anxiety levels, we did not directly measure test anxiety. Because of IRB constraints, we were also unable to include any “control groups,” that is, sections of the same course taught by the same instructor but without optional exam retakes. Other limitations include using a coarse measure of socioeconomic status, that is, eligibility for federal Pell grants. More fine-scale data on socioeconomic status and basic needs insecurities among our student population could have provided additional important insights into patterns of student participation and performance on exam retakes. Due to sample size limitations, we were unable to take an intersectional approach in our analyses and assess how systems of oppression such as racism, sexism, and classism, might interact to shape students’ participation and performance in exam retakes. Future studies using a qualitative intersectional approach would be very insightful.
CONCLUSIONS
Taken together, our results show that optional exam retakes could be effective in improving student exam scores, improving learning, and might alleviate student anxiety about high-stakes exams to some extent. However, we found significant demographic differences in participation in exam retakes underscoring the importance of paying attention to accessibility for any instructional intervention. Our work also illustrates the importance of monitoring participation and examining the impact of instructional interventions across demographic groups during and after courses. Students with more socioeconomic privilege might benefit more from interventions that are optional and require time outside of class. Thus, if an intervention is intended to increase equity in student grades and learning, measures to ensure more equitable participation are required.
FOOTNOTES
1 MCAT is the exam that prospective medical students take for admissions into medical schools in the United States.
ACKNOWLEDGMENTS
We would like to thank Michael Angilletta and Josh Caulkins for their help in administering the surveys included in this study. We are also grateful to the students who filled out our survey. Members of the Biology Education Research lab at Arizona State University gave us valuable feedback on our study design and analyses, and we specifically thank Rachel Scott for her feedback on the manuscript. We are also very grateful to the two anonymous reviewers whose comments greatly improved the paper. Lastly, we want to thank the undergraduate researchers who participated in our think-aloud interviews to establish cognitive validity of our survey instruments. This work was supported by grant no.