The Graph Rubric: Development of a Teaching, Learning, and Research Tool
Abstract
As undergraduate biology curricula increasingly aim to provide students with access to courses and experiences that engage them in the practices of science, tools are needed for instruction, evaluation, and research around student learning. One of the important skills for undergraduate biology students to master is the selection and creation of appropriate graphs to summarize data they acquire through investigations in their course work and research experiences. Graphing is a complex skill, and there are few, discipline-informed tools available for instructors, students, and researchers to use. Here, we describe the development of a graph rubric informed by literature from the learning sciences, statistics, representations literature, and feedback and use of the rubric by a variety of users. The result is an evidence-based, analytic rubric that consists of categories essential for graph choice and construction: graph mechanics, graph communication, and graph choice. Each category of the rubric can be evaluated at three levels of achievement. Our analysis demonstrates the potential for the rubric to provide formative feedback to students and allow instructors to gauge and guide learning and instruction. We further discuss and identify potentially interesting research targets for science education researchers.
INTRODUCTION
Reforms to biology education from K–12 through undergraduate levels call for students taking part in the practices of science, including inquiry and quantitative data analysis, interpretation, and decision making (American Association for the Advancement of Science [AAAS], 2011; College Board, 2011; National Research Council, 2011; Common Core State Standards Initiative, 2012; Next Generation Science Standards Lead States [NGSS], 2013). Furthermore, there are calls for all undergraduate students to participate in research (AAAS, 2011; President’s Council of Advisors on Science and Technology, 2012; Howard Hughes Medical Institute, 2013), an experience that will ultimately engage them in data analysis and communication of their findings. As part of these reforms, students will need to develop quantitative literacy skills, such as graphing, to enable them to solve problems and ask questions using quantitative evidence and methods (American Association of Colleges and Universities, 2010). Therefore, graphical literacy and competence is essential for undergraduate biology students and an important life skill for non–science majors, as well.
Graphing skills can be broadly separated into graph interpretation and graph construction. The interpretation of graphs requires cognitive engagement in statistical, experimental, and proportional reasoning in addition to visuospatial skills (Shah et al., 1999; Garfield, 2003; Garfield et al., 2007; Bengtsson and Ottosson, 2006). The graph must be decoded to allow for the extraction of information to make inferences (Friel and Bright, 1996; Shah et al., 1999). Graph construction is a more complex, generative task involving the integration of knowledge, skills, and reasoning from many content areas and incorporating broader thinking about experiments and/or inquiry. The graph constructor needs to draw on knowledge of graphical representations (i.e., representational competence), spatial and proportional reasoning, and statistical and quantitative skills (Tufte, 1983; Mathewson, 1999; Garfield, 2003; Garfield et al., 2007; Bengtsson and Ottosson, 2006). For example, choosing an appropriate graph type to represent data requires an understanding of the variable types to be plotted (e.g., categorical vs. continuous), a consideration of the purpose for graphing the data (e.g., the research question), knowledge of different ways to statistically summarize data, and a basic knowledge of graph types that exist for the type of data to be plotted (diSessa and Sherin, 2000; diSessa, 2004; Grawemeyer and Cox, 2004; Novick, 2004). Further, the nature of the variables and the approaches and measurements used to acquire them play a role in graph construction. Another important feature of a well-constructed graph is its visual appearance. Aesthetic and spatial aspects of graphs impact visual processing and interpretation and need to be considered to ensure clear communication of the data (Tufte, 1983; reviewed in Montello et al., 2014). A well-constructed graph will not only be aesthetically pleasing but will also leverage Gestalt principles (e.g., proximity and continuity; Kellman, 2000; reviewed by Hegarty, 2011), which facilitate the global and local visual analysis that is a natural feature of human visual processing (Franconeri et al., 2012). Finally, the construction of high-quality and meaningful graphs also requires reflective processes that ensure that the form of the data plotted (Konold et al., 2015) and graph chosen are aligned with its purpose for the creator and readers of the graph (Angra and Gardner, 2016, 2017). This reflective piece extends the representational competence needed for graph choice and construction to metarepresentational competence (diSessa and Sherin, 2000; diSessa, 2004), which is implicitly practiced by experts (Angra and Gardner, 2017).
Students (Bray-Speth et al., 2010; McFarland, 2010; Gormally et al., 2012) and even experts (Roth and Bowen, 2001; Rougier et al., 2014; Weissgerber et al., 2015) struggle when choosing appropriate graphs to display their data. Indeed, reviews of primary literature articles published in science, technology, engineering, and mathematics journals have documented the overuse and/or the inappropriate use of certain graph types and data representations (Cooper et al., 2001, 2003; Puhan et al., 2006). Further, several journals have featured articles with guidelines for scientists on graph construction in an effort to improve the clarity of data communication (PLoS Biology, PLoS Computational Biology, BioMed Central). Related to the graph type is the form in which the data are plotted. There is currently a backlash against the overreliance and overinterpretation of descriptive and inferential statistics in the scientific community (e.g., Klaus, 2015, 2016; Saxon, 2015; Weissgerber et al., 2015). While experts have room for improvement in this area, graph creation is far easier for them, given their knowledge of the system under study, data analysis and statistics, and the research question they are addressing (Konold et al., 2015). However, students may lack a sufficient understanding of data (Dasgupta et al., 2014) and statistical techniques, or the proper context given details of the study system or audience (Lovett and Chang, 2007), leading to differences in graph construction decision making between novices and experts (Konold and Lehrer, 2008; Konold et al., 2015; Angra and Gardner, 2016, 2017).
Despite the ubiquity and importance of graphs in science, instructors do not regularly use time in class to engage and enculturate students into the norms and behaviors of experts in graph construction and interpretation (Bowen and Roth, 1998). Further, instructors tend to use oversimplified graphs and fail to deconstruct and analyze figures with students (Bowen and Roth, 1998). This lack of graphical enrichment limits students’ experiences of dealing with the “messiness” that comes with data from biological experiments and the statistical and quantitative techniques used to summarize, analyze, and interpret those data. Commonly used software with graphing features can exacerbate the problems by facilitating quick decision making without thoughtful reflection on the multidisciplinary concepts that are part of data representation. Previous work within the statistics and science education communities, including our own, has revealed some of the basic areas in which undergraduate students have difficulty (Cobb et al., 2003; Lehrer and Schauble, 2004; Novick, 2004; McFarland, 2010; Angra and Gardner, 2016, 2017).
Resources exist to help both students and practitioners increase their competence with graph format selection and construction. These include instructional books (Bertin, 1983; Tufte, 1983; Kosslyn, 1994; Few, 2004) and Web-based interactive tools and modules such as TinkerPlots, BeSocratic (graphical thinking), and CODAP (Common Online Data Analysis Platform; Concord Consortium). As mentioned previously, current recommendations from research journals aim generally to promote the creation of better data visualizations, including graphs (Cooper et al., 2001, 2002; Puhan et al., 2006; Rougier et al., 2014; Slutsky, 2014; Saxon, 2015; Weissgerber et al., 2015; Klaus, 2016; Nuzzo, 2016). As such, these resources rarely focus on the complex reasoning behind graph choice and construction, nor are they grounded in the concepts and measures of a particular discipline. It is therefore difficult to choose an appropriate graph for data (e.g., bar graph for summarized categorical data), without evaluating the advantages and disadvantages of using a particular graph within the context of a given scientific discipline or audience.
The multifaceted and complex nature of graphing makes it difficult for instructors to diagnose student difficulties and for students to master the skill of graphing. There have been scattered efforts to identify and address student difficulties with graphing. For example, Vitale and colleagues (2015) have developed an automated digital tool to evaluate line graphs created by middle and high school students in chemistry and physics classrooms. Their tool can provide quick feedback to researchers and instructors about difficulties that students have based on the slope and trajectories of the lines graphed. However, the tool is limited by graph types and the scientific concepts they model, which are distinct from graphing data from experiments, predictions, or explanations. For example, the data structure in data summary graphs (e.g., bars, points, box and whisker) is an abstraction and distinct from the identity of the data and system in which they were generated. Scaffolded instruction at the undergraduate level has been somewhat successful in increasing the graph interpretation and construction competence of students during part of (Bray-Speth et al., 2010; McFarland, 2010) or an entire (Harsh and Schmitt-Harsh, 2016) semester. However, continued guided and reflective practice over a longer period of time has been recommended (Roth et al., 1999; diSessa, 2004). Therefore, there is a need for additional tools to aid in graphing instruction and research that have broad applicability.
Designing tools for instruction can be done easily through the rubric format, which is commonly used in diverse settings and by a variety of users. Rubrics are commonly used in the classroom by both instructors and students (Jonsson and Svingby, 2007; Panadero and Jonsson, 2013; Brookhart and Chen, 2015), but can be used for program evaluation (e.g., PULSE rubrics; Aguirre et al., 2013; Brancaccio-Taras et al., 2016) and in research (Dasgupta et al., 2014; Ashley et al., 2017), as examples. Rubrics allow for transparency of expectations and a level of objectivity in evaluation, and they require minimal time on the part of the user (Allen and Tanner, 2006; Dawson, 2017). Regardless of the specific type of rubric (i.e., analytic or holistic), the purpose and context of its use, or the user, rubrics typically have the following design features: an articulation of the categories under which something will be evaluated, a definition of the quality of different levels of achievement, and a scoring strategy (Popham, 1997; Mertler, 2001; Allen and Tanner 2006; Allen and Knight, 2009; Jonsson and Svingby, 2007; Panadero and Jonsson, 2013; Brookhart and Chen, 2015; Dawson 2017).
We developed an analytic graph rubric with three levels of achievement for each of the graphing subcategories. The objectives for the design of the rubric were to create a tool that would 1) facilitate the teaching and evaluation of data summary graphs, 2) provide undergraduate students with formative and summative feedback on their graphs, and 3) allow education researchers to evaluate graphing artifacts to assess experimental and quantitative skills. In this article, we describe the rubric development process, the sources of validity and reliability evidence we gathered, and the insights we gained related to the scope of use and potential of the rubric to improve student competence with graph construction.
METHODS
All work with human subjects, as appropriate, was performed under approved protocols (IRB#1210012775 and #1803020378). During the process of designing the graph rubric, we gathered validity and reliability evidence so that we, and others, could use the graph rubric in teaching and research. Validity in our study is “the relationship between the content of the test and the construct it is intended to measure” (American Educational Research Association, American Psychological Association, and National Council on Measurement in Education [AERA, APA, and NCME], 2014, p. 14). In the context of our work, we sought validity evidence to ensure that the graph rubric has appropriate categories, descriptions, and guidelines that can be used to measure and assess student understanding and application of concepts and skills relevant to graph choice and construction. To this end, our design process involved establishing construct validity, which refers to the claim that the content and features of the instrument (i.e., the graph rubric) are well supported with evidence (Benson, 1998; AERA, APA, and NCME, 2014, p. 11). In support of our overall claim of construct validity for the graph rubric as a tool to evaluate graphs, we gathered evidence for content and face validity. Establishing content validity involves gathering data in support of the claim that the instrument includes all relevant features of the subject under examination (Benson, 1998). In our case, we consulted diverse sources to ensure that the graph rubric encompasses appropriate criteria or content used to evaluate graphs (Table 1). We also approached diverse users to gather evidence of face validity, which is the ability to conclude that an instrument (i.e., the graph rubric) is appropriate and effective in achieving its aims (Holden, 2010). While the rubric is not a test instrument, our design and construct validation process was informed by the instrument design literature and its application in discipline-based education research (Benson, 1998; Corwin et al., 2015) and consisted of three stages: 1) substantive, 2) structural, and 3) external. Although this process generally follows a linear path, there were cycles of revision and repetition of some stages. These design stages, our activities, and the types of validity evidence they contribute to are summarized in Table 1. As part of the evaluation of the construct validity of the rubric, we used interrater reliability (IRR) with a diverse group of users to understand consistency in judgment and scoring of graphs using the rubric (Holsti, 1969; Jonsson and Svingby, 2007; see Data Analysis below).
Stage and purpose | Type of validity | Activities and sources of evidence |
---|---|---|
| Content validity: Assurance from diverse sources that the graph rubric encompasses appropriate criteria or content used to evaluate graphs |
|
|
|
|
|
|
|
Stage 1. Substantive Stage: Identifying Graphing Elements by Consulting the Literature and Ongoing Research
The substantive stage led to the initial draft of the graph rubric with its categories, subcategories, and definitions. Three sources of information contributed to this stage and supplied content validity evidence for the concepts within the rubric (Table 1). We consulted the graphing and visual representations literature, student-generated graphs and reflections from a classroom study (Angra and Gardner, 2015; Angra, 2016), and graphs and the articulated reasoning constructed by students and professors in a think-aloud clinical graphing interview (Angra and Gardner, 2017).
We began the process of rubric development by consulting books and primary literature that discuss appropriate graphing practices. Because graphs are ubiquitous in many fields, we did not restrict our literature search to biology at this stage. When doing our literature search for articles on graphing, we consulted Google Scholar and the university’s online library for article recommendations. We searched broadly for articles using keywords including “graph,” “construction,” “choice,” “presentation,” “science,” and “practices.” We then extended our research by consulting the reference sections in the articles. We read each reference, made notes on the authors’ recommendations on proper graph choice and construction practices, and grouped similar recommendations together. As graphs are visual representations of data, we consulted select seminal work in the visual representations literature to identify theory and best practices (e.g., Tufte, 1983; diSessa, 2004).
To supplement the literature review and aid in rubric development, we used data from two ongoing graphing studies (Angra, 2016; Angra and Gardner, 2017). Briefly, the first graphing study took place in a physiology laboratory in which students produced graphs from their experimental data. Specifically, we were interested in the general qualities of the graphs produced (graph type, data plotted, overall appearance, understanding of the take-home message) and student reasoning for graphs they produced (Angra, 2016). The second graphing study was an expert–novice analysis conducted to understand how professors and students constructed and reflected on their graphs in a think-aloud interview setting (Angra and Gardner, 2017).
Stage 2. Structural Stage: Soliciting Feedback to Establish Content and Face Validity
During this stage, we sought content and face validity evidence to convince us that the rubric contents and structure were appropriate and relevant for evaluating graphs in biology (Table 1). We accomplished this by seeking feedback on the rubric from four different groups of people: 1) science education scholars, 2) non–education research biology graduate students who were actively pursuing either a master’s or doctoral degree, 3) undergraduate biology students enrolled in an upper-level physiology laboratory course, and 4) biology instructors. Incorporating feedback from participants at various levels of education and with expertise in various fields allowed us to check the learning goals and usability of the rubric. Feedback from students allowed us to make sure that the language in the rubric was clear and easy to understand.
Science Education Scholars.
Drafts of the rubric were presented to an interdepartmental biology education research group of science education scholars (Table 1) that includes chemistry and biology education graduate students and postdoctoral fellows and instructors from the department of curriculum and instruction, biology, and chemistry. The reason for sharing the graph rubric with science education scholars was to obtain feedback from people with pedagogical expertise. The objective of the first meeting with this group was to obtain targeted feedback on the first draft of the graph rubric. In the first draft, we used a binary scale (i.e., present/not present) for the mechanics category and three levels of achievement for the other categories. We presented two de-identified student graphs (Graphs 1 and 2 in Appendix C, Supplemental Material) produced by different student groups in a physiology laboratory course along with a brief overview of the students’ experimental designs and variables associated with that particular laboratory context. Each science education scholar was instructed to independently use the graph rubric to evaluate both student graphs, then pair and discuss their ratings with a partner; this was followed by a group discussion guided by A.A. and S.M.G. The guided group discussion began with broad questions to solicit feedback from the participants about rubric use, appropriateness, and descriptions of the rubric categories and subcategories. Percent agreement as an estimate of IRR between the science education scholars and authors was calculated after the meeting to gauge consistency in rubric scoring across the categories (see Results). IRR scores from the first meeting were low and are not reported in this article, but conversations about rubric scoring are provided in the Results section, as they were fruitful for rubric revisions.
After the initial round of feedback, the rubric categories and subcategories were expanded and refined based on comments from science education scholar group, further literature review, and ongoing graphing research (Table 1). We standardized the levels of achievement to three categories: “present/appropriate,” “needs improvement,” and “unsatisfactory.” In addition, we adjusted the weighting of the scoring of the subcategories across the three main categories of the rubric to reflect the level of cognitive demand; scoring of items in the “mechanics” category is weighted less than scoring of items in the “communication” and “graph choice” categories (Figure 1). Using similar protocols but at a later time, the science education scholars were asked to use the revised rubric to evaluate Graph 3 (Appendix C, Supplemental Material).
Biology Graduate Students.
We obtained feedback from 10 biology graduate students present at a biweekly graduate seminar (Table 1), using the revised version of the graph rubric (Figure 1). Feedback from this group is important because of the role they play as teaching assistants in assisting the main instructor to deliver knowledge and/or provide feedback to students, usually with a specific rubric or answer key. We gave the biology graduate students a copy of the graph rubric and a student-generated graph (Graph 3 in Appendix C, Supplemental Material) with the corresponding research question and hypothesis to review independently; this was followed by a think–pair–share and a general discussion. IRR was calculated after the meeting to gauge consistency of rubric scoring across the graph rubric categories.
Undergraduate Students.
We tested the utility of the graph rubric in an upper-level physiology laboratory classroom with undergraduate students to 1) provide instructor feedback on graphs they constructed as a group and 2) have them use the graph rubric to provide peer feedback. Briefly, students worked in teams to design original experiments, collect data, and display findings in graphs. In conjunction with previously published graph tools (Angra and Gardner, 2016), students used the graph rubric to guide their graph construction and to inform their anonymous graph peer review, which occurred four times during the semester. At the end of the semester, students were prompted to anonymously fill out a survey and provide feedback on the usability of the rubric and the appropriateness of the rubric for the task and to offer suggestions for improving the rubric.
Biology Instructors.
We recruited four research-active biology instructors from diverse biology subdisciplines to gather face and content validity. Instructors were shown a copy of the graph rubric (Figure 1) and were asked for feedback regarding the appropriateness of the rubric categories, its potential usability in the classroom and helpfulness to students, and the scoring features of the rubric.
Stage 3. External Stage: Usage of the Graph Rubric in Different Contexts and by Diverse Users
This stage consisted of using the final rubric (Figure 1) to evaluate graphs from different sources and by users from diverse external stakeholder groups to provide us with additional content and face validity evidence. The sources of evidence were derived from evaluation of 1) student-generated graphs from an upper-level undergraduate physiology class; 2) student-generated graphs from a biology instructor’s class; and 3) graphs from selected chapters from five introductory biology textbooks. To standardize and guide independent users’ scoring of graphs with the rubric, we constructed graph rubric training materials (Appendix B, Supplemental Material). These materials define and explain the features of the rubric and include example scoring of five graphs, each from the three levels of achievement, as shown on the final version of the graph rubric. IRR was calculated for each external user and an expert rater.
Feedback from Undergraduate Biology Majors.
We gathered feedback on an independent graph evaluation task from undergraduate students (n = 7) who had successfully completed an upper-level physiology course. We provided the participants with the graph rubric training materials (Appendix B, Supplemental Material) and five, de-identified student-generated graphs to evaluate with the rubric (Appendix D, Supplemental Material). Graphs chosen represented typical graph types and displayed some common undesirable attributes such as plots of all raw data when a descriptive statistic would be appropriate; the use of dark backgrounds and gridlines, which deflect attention from the data displayed; plots of averages without error bars; and misalignment of the graph with the research question and/or hypothesis. Students were encouraged to comment and explain their reasoning for their scoring in each of the graph rubric subcategories.
Feedback from Biology Instructors.
To gather feedback and evaluate the rubric as a teaching tool within the context undergraduate biology courses, we recruited biology instructors who have students create or interpret graphs as part of their normal classroom instruction. We purposely recruited instructors who teach courses ranging from the introductory levels to advanced undergraduate and graduate levels. The four faculty instructors taught a range of courses: a course-based undergraduate research experience (CURE) introductory biology laboratory; intermediate-level physiology and cell biology courses; and upper-level field ecology, conservation biology, and neurobiology courses. We provided each instructor with the graph rubric and rubric training materials (Appendix B, Supplemental Material) and asked them to select and evaluate between five and 10 student graphs (with accompanying research question and/or hypothesis statements) with the graph rubric (see Appendix E in the Supplemental Material for descriptions). The graphs were returned to the research team for “expert” scoring with the graph rubric for comparison of scoring with each instructor. In addition, each instructor completed a brief survey to provide feedback on the clarity, usability, and appropriateness of the rubric for evaluating student graphs in their courses.
Evaluation of Biology Textbook Graphs.
Because undergraduate students may encounter graphs in their textbooks as part of their course work, we evaluated graphs from five introductory biology textbooks to augment our content validity evidence (see Table 7 later in this article and Appendix H, Supplemental Material). We chose four textbooks (Raven et al., 2008; Sadava et al., 2009; Singh-Cundy and Shin, 2010; Urry et al., 2014) based on the undergraduate curriculum for biology students at a large midwestern university. The fifth textbook (Campbell et al., 2014) was chosen because it integrates the recommendations put forth by Vision and Change to incorporate more quantitative thinking in biology (AAAS, 2011). Our selection criteria and graph analysis followed that of Rybarczyk (2011) and Hoskins et al. (2007). We randomly selected 10 chapters from each textbook and analyzed pages with graphs as stand-alone artifacts using the graph rubric. The definition that we use for a graph is taken from Kosslyn’s (1994, p. 2) work: “a visual display that illustrates one or more relationships among numbers.” We expanded this definition and analyzed graphs that were in a Cartesian coordinate system, framed with x- and y-axes, and found in the main chapter or in the side-panel chapter exercises (see Appendix G in the Supplemental Material for a list of graphs on which evaluation was performed). We excluded interactive graphs, graphs found in videos, and graphs found in the end-of-chapter exercises. Because the graphs in textbooks were rarely directly derived from or presented as related to experiments, we did not include evaluation of the “alignment” subcategory of the rubric.
Data Analysis
We used IRR to quickly identify and refine areas of the rubric during the structural stages of rubric design and to provide us with feedback on the broad use and scope of the rubric during the external stage (Table 1). In this way, the IRR analysis contributed to both content and face validity evidence. We were able to identify areas in which the content and the structure of the rubric were well understood and relevant to users. In addition, IRR provided us with insight into how different raters at various skill levels use the rubric and how they rate graphs that they are most likely to encounter in their own contexts. We first calculated IRR in the form of percent agreement between raters to quantify reliability between expert raters (A.A. and S.M.G.) and each individual population that was asked to use the graph rubric for the structural stage (McHugh, 2012). Because the percent agreement between the two expert raters was high (>90%), percent agreement between other raters (e.g., students or instructors) and either expert rater is used for the values presented here (Stemler, 2004). In qualitative research, an IRR agreement of 80% or higher is considered acceptable (Holsti, 1969). This will inform limitations and usage of the rubric and suggest possible avenues of implementation in the classroom.
RESULTS
Rubric Content and Structure
A critical feature of analytic rubrics is the clear articulation of areas for evaluation with clear explanations for the evaluative criteria (i.e., categories) that users of the rubric need to complete the task (Dawson, 2017). To construct our rubric for graph construction, we began by seeking appropriate evaluative criteria that are characteristic of graphs during both the construction and interpretation processes. For general criteria regarding data presentation and visualizations, we used five books that include guidance on data visualizations (Tufte, 1983; Kosslyn, 1994; Few, 2004; Evergreen, 2014, 2018), and we also consulted 26 primary literature sources on topics that ranged from graph construction with middle school students to evaluation of graphs constructed by physicians for medical journals (Table 2). Additionally, we contributed findings from our ongoing research toward the graph rubric (Angra 2016; Angra and Gardner, 2016, 2017; Table 2). We found that books on data presentation and visualizations heavily emphasize the importance of aesthetics and considering the ink–data ratio that exists in each graph representation. The books also emphasize: descriptive labels on the x- and y-axes and a title to frame the message that is being conveyed by the graph; a key to show the various colors used in the graph; appropriate axis scaling to show proper intervals conveyed by the data represented in the graph; and thinking about the appropriateness of the graph representation for the data to be displayed. Given the general target audience for the books, the graph criteria were illustrated in multiple non-science examples with pros and cons of each. However, our graphing literature review further supported the importance of the categories mentioned above and expanded support for axis units, data displayed, and aligning the graph to its original intended purpose (e.g., question to be answered; Table 2).
Graph rubric elements | Sources that informed descriptionsa |
---|---|
Descriptive title
| Kosslyn, 1994; Evergreen, 2018; Puhan et al., 2006; Angra, 2016; Angra and Gardner, 2017 |
Label for the x-axis (e.g., time)
| Kosslyn, 1994; Few, 2004; Federico et al., 2012; Elliott et al., 2006; Puhan et al., 2006; Angra, 2016; Angra and Gardner, 2017 |
Label for the y-axis (e.g., heart rate)
| Kosslyn, 1994; Few, 2004; Federico et al., 2012; Elliott et al., 2006; Puhan et al., 2006; Angra, 2016; Angra and Gardner, 2017 |
| Leinhardt et al., 1990; Puhan et al., 2006; Angra, 2016 |
| Leinhardt et al., 1990; Puhan et al., 2006; Angra, 2016 |
| Tufte, 1983; Kosslyn, 1994; Few, 2004; Evergreen, 2018; Leinhardt et al., 1990; Cleveland, 1984; Duke et al., 2015; Angra, 2016 |
| Few, 2004; Evergreen, 2018; Angra, 2016; Angra and Gardner, 2017 |
| Tufte, 1983; Kosslyn, 1994; Evergreen, 2014, 2018; Cooper et al., 2003; Puhan et al., 2006; Stengel et al., 2008; Federico et al., 2012; Rougier et al., 2014; Duke et al., 2015; Angra, 2016 |
| Evergreen, 2018; Cooper et al., 2003; Federico et al., 2012; Rougier et al., 2014; Duke et al., 2015 |
| Kosslyn, 1994; Evergreen, 2018; Padilla et al., 1986; Cleveland, 1984; Schriger and Cooper, 2001; Few, 2004; Leonard and Patterson, 2004; Patterson and Leonard, 2005; Drummond and Tom, 2011; Drummond and Vowler, 2011; Franzblau and Chung, 2012; Humphrey et al., 2014; Duke et al., 2015; Saxon, 2015; Weissgerber et al., 2015; Klaus, 2016; Angra, 2016; Angra and Gardner, 2016, 2017 |
| Wild and Pfannkuch, 1999; Friel and Bright, 1996; Konold and Higgins, 2003; Konold et al., 2015; Angra, 2016 |
| Konold and Higgins, 2003; Rougier et al., 2014; Angra, 2016;Angra and Gardner, 2016,2017 |
Our preliminary evaluative criteria consisted of the categories emphasized by the existing literature but were further refined from our ongoing research (Angra and Gardner, 2016, 2017). Think-aloud interviews conducted to understand how graphs are constructed by experts and novices revealed that students titled their graphs with the subject and variables, a detail that is important for the graph reader to see when interpreting the graph (Angra and Gardner, 2017). We also noted that professors verbally articulated experimental details that belong in the key, such as sample size and number of trials. Finally, an important characteristic of data summary graphs in biology or sciences, in contrast to conceptual graphs or data exploration graphs, is the alignment of the graph with its intended purpose, such as the research question and hypothesis. Experts do this routinely, while students do not (Angra 2016; Angra and Gardner, 2016, 2017). Additionally, two articles from our literature search (Konold and Higgins, 2003; Rougier et al., 2014; Table 2) mention making a graph so it has a purpose, but do not explicitly state the alignment of the graph to the research question or hypothesis.
On the basis of the literature and research review, we created a list of 12 graph construction categories with definitions (Table 2). To organize and characterize the evaluative criteria for the 12 categories, we aggregated them into three broader categories: graph mechanics, communication, and graph choice. Graph mechanics includes the title, axis labels, axis units, axis scaling, and a key. Communication consists of aesthetics and take-home message. In our literature search, we noticed a high emphasis on communication (Table 2 and Figure 1), which is why we decided to create this separate category. Finally, graph choice includes tasks like choosing a graph type, thinking about the data displayed, and alignment of the graph with its intended purpose (Table 2 and Figure 1).
An important feature of any rubric is the quality levels or numeric criteria that tell students how they will be graded. We chose to use three quality levels and express them in statements of student performance that are used to distinguish specific graph construction elements as “present/appropriate,” “needs improvement,” or “unsatisfactory” (Dawson, 2017), each with associated point values (Figure 1). Feedback, testing, and use of the rubric by diverse users suggests that using three levels of achievement works well for most users (see below).
Rubric Testing and Implementation
We share here our findings in the form of conversations and IRR with science education research scholars (graduate students, postdoctoral fellows, and faculty), non-education research biology graduate students, undergraduate biology students enrolled in an upper-level physiology laboratory course, and biology instructors. These data contribute to the evidence in support of the content and face validity of the graph rubric (Table 1).
As the first step in the structural stage of the rubric design, we used the first draft of the graph rubric to gather feedback from science education scholars. Three important outcomes resulted from the first round of structural stage of the rubric design. First, all participants approved of the categories, subcategories, and descriptions within the rubric, but suggested that changes be made to the levels of achievement for the subcategories within graph mechanics by weighting the individual criteria for the subcategories to be less than those in the graph choice and communication categories. Participants felt that the cognitive difficulty of the mechanics category was lower compared with the other categories and should be weighted accordingly.
Second, during the conversations, a science education research scholar from the College of Education suggested that we might want to consider not just evaluating one graph, but a set of graphs with the graph rubric to determine a more accurate take-home message. Although we agree that looking at multiple but related graphs like those found in science articles first before formulating a take-home message is helpful, it was our purpose to produce a graph rubric that evaluates one graph at a time, because this is a common graph construction practice in the classroom. Further, the spirit of this subcategory of the rubric was to capture whether the graph was constructed in a way in which one could discern whether or not there were trends in the data and not necessarily the specific type of conclusion, which requires direct knowledge of the discipline or experiments.
Third, a graduate student in chemistry suggested that a subcategory on figure legends be considered for the graph rubric. Although we agreed that figure legends provide helpful information when encountered in a science paper, they are not universal; for example, they are not found in oral presentations when the graph is a stand-alone item. Furthermore, different sets of skills are required when writing a figure legend, and these fall outside of the scope of our current work (Angra 2016; Angra and Gardner, 2017).
These types of conversations were vital during this first round of the structural stage of the rubric design process (Table 1), providing us with feedback on the structural components and content within the rubric. Revisions to the rubric from this round included refining and clarifying the subcategory definitions and adjusting the point values assigned to parts of the rubric to reflect the cognitive difficulty of the items (i.e., graph mechanics scoring was decreased in weight). We also separated the communication category into two subcategories, “aesthetics” and “take-home message,” and added the “alignment” subcategory within the graph choice category (Figure 1).
The revised rubric was presented later to the science education scholars for further feedback. Science education scholars were asked to evaluate a student-generated graph (Graph 3, Appendix C, Supplemental Material) and then engaged in a discussion. Percent agreement with the ratings of an expert rater was calculated for the attendees for each category within the graph rubric. The overall percent agreement with attendees was 82%, which is considered excellent (Holsti, 1969). There was greater than 80% agreement on all subcategories of graph mechanics, except for the scale, which scored 33% before the general discussion (Table 3). Lower percent agreement was also observed for the take-home message subcategory of communication and the graph type subcategory of graph choice. The last category in which there was a low percent agreement and raters tended to underscore was the take-home message subcategory under communication. On the basis of this analysis, we realized that we needed to increase the clarity of our definitions for elements within the rubric (Table 4), and that led us to develop training materials for new users (see Appendix B in the Supplemental Materials).
IRR (% agreement)a | |||
---|---|---|---|
Graph rubric category | Science education scholars (n = 6) | Biology graduate students (n = 10) | |
Graph mechanics | Descriptive title | 83 | 50 |
Label for the x-axis | 100 | 50 | |
Label for the y-axis | 83 | 100 | |
Units for the x-axis | 100 | 90 | |
Units for the y-axis | 100 | 90 | |
Scale | 33 | 70 | |
Key | 100 | 20 | |
Communication | Ease of understanding—aesthetics | 100 | 60 |
Ease of understanding—take-home message | 50 | 80 | |
Graph choice | Graph type | 67 | 80 |
Data displayed | 83 | 70 | |
Alignment | 83 | 100 | |
Average task IRR | 82 ± 22 | 72 ± 24 |
Science education scholars feedback use |
|
Graduate student feedback from use of the rubric |
|
Biology instructor feedback on the graph rubric categories, usability, and utility |
|
Undergraduate student feedback from use of the rubric in the classroom, Spring 2015 |
|
Undergraduate student feedback on the graph rubric categories, usability, and utility |
|
Feedback from science education research scholars provided us with valuable pedagogical feedback, but we also wanted to solicit feedback from users who grade student assignments. We sought feedback from 10 biology graduate students who were shown the same graph as the science education scholar group (Graph 3 in Appendix C, Supplemental Material) and asked them to score the graph independently, after which there was discussion and feedback. The graduate students reported that the rubric was clear and easy to use (Table 4). While the rubric was used easily by the graduate students, compared with the science education scholars, the graduate students had more low percent agreements with the expert rater. The lowest level of agreement was noticed in the graph mechanics category “key.” This resulted from the graduate students deviating from the definition on the rubric and underscoring the category, because they said the key was vague and some graduate students did not like where it was placed in relation to the data on the graph. This is an element that was not explicitly articulated in the rubric but would fall into the subcategory of aesthetics.
Next, we sought informal written feedback from students enrolled in the Spring 2015 semester (Table 1) of a physiology laboratory course. These students used the rubric multiple times over the semester to inform their graph construction, critique peer graphs, and interpret feedback from the instructors. Graphing and presenting data were important components of the course, and students readily used the graph rubric and found it to be a valuable resource. Comments from students are displayed in Table 4.
Finally, we showed the graph rubric to biology instructors and asked them for written feedback regarding the appropriateness of the rubric categories, usability in the classroom, helpfulness to students, and scoring. All instructors agreed with the content and structure of the rubric, including the division and distribution of elements within categories and weighting of the scoring, and felt it could be useful in their classrooms (Table 4).
Application of the Graph Rubric to Diverse Contexts in Undergraduate Biology Instruction
We wanted to explore the broad utility of the rubric by having a diverse set of rubric users from a variety of classroom contexts evaluate student-generated graphs and also by characterizing the features of graphs found in textbooks, which is one of the ways undergraduate students are exposed to graphs and provides a potentially strong model of graphs for students. Here, we report results from 1) undergraduate student evaluation of student-generated graphs from a classroom, 2) instructor use of the graph rubric to score graphs produced by their students, and 3) analysis of textbook graphs. These data contributed to further content and face validity evidence for the graph rubric (Table 1).
Undergraduate Student Evaluation of Graphs.
We gave undergraduate students who had some previous experience with the graph rubric from their physiology course a variety of student-generated graphs to score, ranging from a more unfamiliar graph type like a box-and-whisker plot (Graph 1, Appendix D, Supplemental Material) to more familiar graph types like line graphs (Graphs 2 and 4, Appendix D, Supplemental Material), scatter plots (Graph 3, Appendix D, Supplemental Material), and a bar graph (Graph 5, Appendix D, Supplemental Material). Overall, the graph rubric ratings of student-generated graphs by students (n = 7) were consistent with those of the expert rater with an overall average percent agreement of ≥ 71% (Table 5). However, student scoring of graphs using the rubric revealed several things. One interesting finding is that almost all students scored Graph 1, the box-and-whisker plot, as “needs improvement” instead of “present/appropriate” for the data-displayed category. Student reasoning for underscoring the graph was that it was not explicit to the type of data plotted. This hints at student difficulty in interpreting box-and-whisker plots, which display data and descriptive statistics in a way that is challenging to novices (Bakker et al., 2004). We also observed that students did not object to the dark background in Graph 5, clashing with Tufte’s rule to maximize data–ink ratio.
Graph rubric category | Graph 1 | Graph 2 | Graph 3 | Graph 4 | Graph 5 | |
---|---|---|---|---|---|---|
Graph mechanics | Descriptive title | 29 | 100 | 71 | 57 | 71 |
Label for the x-axis | 100 | 29 | 86 | 100 | 100 | |
Label for the y-axis | 71 | 86 | 43 | 100 | 86 | |
Units for the x-axis | 86 | 100 | 100 | 100 | 86 | |
Units for the y-axis | 100 | 71 | 100 | 100 | 100 | |
Scale | 100 | 71 | 86 | 86 | 100 | |
Key | 14 | 57 | 57 | 71 | 86 | |
Communication | Ease of understanding—aesthetics | 86 | 86 | 71 | 43 | 14 |
Ease of understanding—take-home message | 86 | 71 | 0 | 71 | 57 | |
Graph choice | Graph type | 100 | 14 | 100 | 29 | 14 |
Data displayed | 14 | 86 | 57 | 86 | 86 | |
Alignment | 43 | 100 | 86 | 43 | 57 | |
Average (%) task IRRb | 69 ± 9 | 73 ± 12 | 71 ± 14 | 74 ± 9 | 71 ± 12 |
Instructor Use of Graphs in the Classroom.
To evaluate the potential of the graph rubric to be used in diverse classrooms and by diverse instructors, we recruited four undergraduate biology instructors from different biological subdisciplines and course contexts. Instructors were provided with the graph training materials (Appendix B, Supplemental Material) and were asked to thoroughly study them before proceeding to evaluate student-generated graphs from their classrooms with the graph rubric. None of the graphs submitted by the instructors were accompanied by a figure legend and all were therefore scored as stand-alone artifacts by all raters (instructors and expert). Overall, the graph rubric ratings of student-generated graphs by the instructors were consistent with those of the expert rater, with an overall average percent agreement of ≥ 72% (Table 6 and Appendix F, Supplemental Material), which is good, given that no other training on the rubric had been provided. There were three graph rubric subcategories that had the highest numbers of differences in ratings: title and key (mechanics) and aesthetics (communication). Instructors 2 and 3 consistently rated titles as “present/appropriate,” while the expert rater rated the titles as “needs improvement.” Examining the survey feedback revealed that Instructor 3 realized that the titles were not fully complete, but felt they were close enough to warrant full credit. Full credit was given for the keys on graphs from the classrooms of Instructors 2 and 3 more often than by the expert rater, even when elements such as the sample size were not indicated on the graph. Instructors 2, 3, and 4 consistently rated aesthetics as “present/appropriate” instead of “needs improvement” for graphs that contained unnecessary grid lines in the background or lacked y-axis lines. All instructors felt that the rubric categories and definitions were appropriate and that the rubric itself was easy to use and would be a valuable addition to their introductory and upper-division biology classrooms (see Table 4).
Graph rubric category | Instructor 1 (n = 8) | Instructor 2 (n = 10) | Instructor 3 (n = 5) | Instructor 4 (n = 12) |
---|---|---|---|---|
Course type | Introductory laboratory and upper-level ecology | Introductory cell biology and upper-level neurobiology | Upper-level ecology | Upper-level physiology |
Mechanics | 86 ± 18 | 73 ± 24 | 71 ± 22 | 83 ± 11 |
Communication | 75 ± 0 | 70 ± 28 | 70 ± 42 | 67 ± 0 |
Choice | 83 ± 19 | 80 ± 10 | 73 ± 12 | 72 ± 10 |
Average (%) task IRRb | 83 ± 9 | 74 ± 4 | 72 ± 15 | 78 ± 7 |
Analysis of Textbook Graphs.
Because students also encounter graphs in their assigned readings for their courses, which includes textbooks, we wanted to determine how well the graph rubric captured features of those graphs. The analysis of textbook graphs supports our first objective for the development of the rubric, which is to facilitate the teaching and evaluation of data summary graphs. The graph rubric was generally useful and appropriate for evaluating graphs from introductory biology textbooks. Because the purpose of the Campbell et al. (2014) textbook is to incorporate more experiments, data, and quantification in biology, we noticed that, compared with the other four textbooks analyzed, there were approximately seven times more graphs present in this online textbook on average. Bar and line graphs were the most common type across all five textbooks, and scatter, dot, and box-and-whisker plots were the least common (Figure 2).
The average percentage of graphs that received a “present/appropriate” rating from the graph rubric in the graph mechanics, communication, and graph choice categories is displayed in Table 7 and Appendix H in the Supplemental Material. Looking broadly across the graph rubric categories, we see that there was variability within textbooks for good graph design, and no one textbook received a perfect score. For example, we noticed variation across the subcategories of graph mechanics for graphs within any given textbook. There was also variability across textbooks. There are clear differences between the books for the attributes and quality of graphs displayed as captured by the rubric. For example, graph choice showed a large range of scores between the textbooks.
Graph rubric category | Introductory biology textbooksb | Present/appropriate | Needs improvement | Unsatisfactory |
---|---|---|---|---|
Mechanics | Singh-Cundy and Shin, 2010 (n = 13) | 70 ± 6 | 12 ± 5 | 18 ± 6 |
Urry et al., 2014 (n = 15) | 61 ± 7 | 24 ± 7 | 15 ± 5 | |
Sadava et al., 2009 (n = 36) | 75 ± 3 | 13 ± 3 | 12 ± 2 | |
Raven et al., 2008 (n = 43) | 76 ± 3 | 19 ± 4 | 5 ± 1 | |
Campbell et al., 2014 (n = 33) | 69 ± 4 | 26 ± 5 | 4 ± 1 | |
Communication | Singh-Cundy and Shin, 2010 (n = 13) | 42 | 58 | 0 |
Urry et al., 2014 (n = 15) | 80 ± 5 | 20 ± 5 | 0 | |
Sadava et al., 2009 (n = 36) | 85 ± 3 | 15 ± 3 | 0 | |
Raven et al., 2008 (n = 43) | 85 ± 2 | 15 ± 2 | 0 | |
Campbell et al., 2014 (n = 33) | 88 | 12 | 0 | |
Choice | Singh-Cundy and Shin, 2010 (n = 13) | 63 ± 5 | 38 ± 5 | 0 |
Urry et al., 2014 (n = 15) | 60 ± 12 | 40 ± 12 | 0 | |
Sadava et al., 2009 (n = 36) | 81 ± 1 | 19 ± 1 | 0 | |
Raven et al., 2008 (n = 43) | 73 | 27 | 0 | |
Campbell et al., 2014 (n = 33) | 83 ± 1 | 17 ± 1 | 0 |
DISCUSSION
A Tool for Evaluating and Teaching Graphing in Undergraduate Biology
In this article, we aimed to present the rigorous and systematic development of an evidence-based rubric for teaching and evaluating graphs. The graph rubric is a tool designed within the context of undergraduate biology to 1) facilitate the teaching and evaluation of data summary graphs, 2) provide undergraduate students with formative and summative feedback on their graphs, and 3) allow education researchers to evaluate graphing artifacts to assess students’ experimental and quantitative skills. As undergraduate biology students are increasingly engaged in the practice of science as part of their undergraduate curricula, more tools for research and instruction on graphing are needed. Specifically, there is a need for resources that are not generic but are contextualized within the discipline.
The three broad categories of the rubric, and the subcategories within them, allow the rubric user to create and evaluate graphs that are constructed in a manner that is complete (graph mechanics), appropriate for the data and purpose (graph choice), and clear and easy to interpret (graph communication). The graph rubric complements and extends from existing guidebooks and other resources (Table 2) by explicitly incorporating important concepts and skills needed for graph choice and construction in the context of biology. We incorporated expert-like, reflective practices such as the checking the alignment of the graph with its purpose (e.g., evaluating a hypothesis; Angra and Gardner, 2017).
Throughout the rubric design process, we gathered content and face validity evidence to support our claim that the rubric is an appropriate and usable tool to evaluate graphs in the undergraduate biology context (i.e., construct validity; Table 1). We are confident that the evidence we gathered is sufficient to support this claim in our context. As part of the design process, we consulted existing resources (e.g., instructional books and literature) and three important stakeholder groups (i.e., students, instructors, and science education scholars) for the rubric. The face and content validity evidence we gathered and used during the substantive and structural stages allowed us to be confident that the rubric was capturing important and relevant elements of strong graph design. Particularly valuable was the feedback collected from undergraduate biology students, biology instructors, and science education scholars during the structural stage (Table 4). We were able to clarify and refine terms and definitions and organize the rubric in a manner that was understandable to all user groups. We adjusted the weighting of the scoring points to reflect the cognitive difficulty within the three broad categories of the rubric, with graph mechanics weighted less than graph communication and choice. Finally, on the basis of feedback during these stages, we emphasized that the inclusion of this subcategory is meant to emphasize a reflection on the purpose of the investigations that generated the data in the first place, which is something that students do not consistently do (Angra and Gardner, 2017). This is not meant to preclude the creation of graphs to explore the data, however.
The external stage of the graph rubric development provided us with additional content and face validity evidence. We gained important insight into the scope of the appropriateness and utility of the graph rubric and some interesting observations about graphs in different contexts that students may encounter. During this stage, we conducted user testing of the rubric by having students and biology instructors evaluate graphs generated in the classroom, and we used the rubric to evaluate graphs in introductory biology textbooks (Table 1). The graphs that the students and biology instructors evaluated were single graphs extracted from class assignments, which included oral presentations and written work (e.g., research posters and lab reports). Textbook graphs, as previously described (Hoskins et al., 2007; Rybarczyk, 2011), were often stylized representations of data embedded in multimedia figures, also with figure legends. While the two graph contexts (i.e., classroom vs. textbook) were different, the attributes of the graphs aligned with the typical communication purpose of the context: graphs in oral presentations are accompanied by real-time verbal narrations of the graph, while graphs in textbooks are embedded in descriptive text and their purpose is often to summarize data trends, albeit often in an oversimplified manner not true to the natural “messiness” of the actual data (Hoskins et al., 2007; Rybarczyk, 2011).
Limitations of the Graph Rubric
The graph rubric is designed to assist in the creation and/or evaluation of graphs as a stand-alone piece of communication, similar to what would be seen in oral presentations of data. Because of the limited amount of space allotted for figures by research journals, graphs are usually small and do not have titles, labels, and keys. Instead, this information is found in the figure legend, a category not present in our graph rubric. The absence of the figure legend from the rubric was noted by individuals in the science education scholar and graduate student groups. However, while figure legends are informative accompaniments to a graph, we feel they are beyond the scope of the graph rubric for two reasons. First, we want to promote the creation of clear representations of data. Because graphs are meant to be stand-alone representations with the purpose of conveying complex data in a quick and efficient manner, we and others recommend that graphs should be labeled in a descriptive manner and include a key, if necessary (Mack, 2013). Second, writing figure legends is a related but distinct skill that requires knowledge regarding which methods and results to include and a succinct description of what is plotted, with trends noted (Rodrigues, 2013). Users of the graph rubric may modify the rubric to include figure legends and define a set of criteria at each level of achievement to communicate expectations to students.
Although we provided the rubric users with training materials to consult independently in the external stage of the rubric design, there were some instances of low agreement in the three graph rubric categories (Tables 5 and 6). This is likely because the users were not trained on rubric use in collaboration with the expert raters. In addition to the effects of minimal rubric training, low consensus in any of the three graph rubric categories could be affected by the different number subcategories within each and the level of subjectivity potentially used in the evaluation (see Figure 1). For example, we observed the most deviation from the expert rater within the mechanics category, which has seven subcategories compared with two and three subcategories in the communication and graph choice categories, respectively. Therefore, while a well-designed rubric should be clear to any user, in theory (Dawson, 2017), we recommend training, practice, and feedback to ensure rigorous and consistent grading in the classroom or within a research project that uses a rubric for evaluation of research artifacts.
The purpose of the IRR consensus estimates was to help us understand the graph rubric use by different people and with graphs from different contexts (Dawson, 2017) and to highlight areas in which things might not be clear or consistently interpreted by users. As such, it is interesting to note that the areas in which the students’ ratings differed from the expert raters are consistent with their status as developing graph makers. For example, students exhibited a tolerance for extraneous features and colors in the graphs (see Graphs 4 and 5, Appendix D, Supplemental Material), which led to differences in scoring in the graph communication category. In contrast, while the biology instructor group was small (n = 4), there was little variability across the four instructors.
The final source of content and face validity evidence we sought was from an evaluation of graphs from introductory biology textbooks. The purpose of our evaluation of textbook graphs was not to criticize textbooks, but rather to examine a source that is potentially a strong model to students for what constitutes data and graphs, given the presence of these books in the 100-level in university curricula. In general, we found that the graph rubric was able to capture and describe the elements of the graphs in textbooks. However, consistent with their typical purpose, graphs in textbooks rely even more heavily on the figure legend and surrounding text for complete understanding, as reflected in lower scores in the scale and key subcategories (Table 7 and Appendix H, Supplemental Material). In addition, we omitted the alignment subcategory, as the display of data explicitly resulting from experiments was rare, and the data displayed were stylized summaries of trends, not quantitative in nature (see data displayed and y-axis units, Appendix H, Supplemental Material). An interesting observation we made in our textbook graph analysis was that not only did graphs vary across the introductory biology textbooks we examined but were variable within a given textbook. This general observation echoes the inconsistent use of arrows within textbooks (Wright et al., 2018). This inconsistent graph model could impair student learning of what constitutes a high-quality graph and also impede their understanding and learning of important biological concepts.
While the basic features of sound graph design are discipline neutral, the norms for graph choice and construction may have some variation across the biological subdisciplines that can be perpetuated by the degree to which research journals guide authors. Interestingly, during and since the work in the external stage of the rubric design, there have been calls for improved data representations in research articles by a variety of journals with broad readership (e.g., Rougier et al., 2014; Slutsky, 2014; Saxon, 2015; Weissgerber et al., 2015; Klaus, 2016; Nuzzo, 2016; Boers, 2018; Hertel, 2018). In addition, more textbooks have begun to incorporate the description of experiments and “messy” data displays (Campbell et al., 2014), which will provide students with a more realistic perspective of scientific data.
As is the case for any assessment and research tool, the external stage is never truly complete, and with each new user and context, validity and reliability evidence need to be gathered to establish the scope of use and inference to help interpret the findings from use of the graph rubric. We chose the elements of the external stage of the rubric development to provide us with initial evidence for the broad usability (e.g., not within a single biological subdiscipline and a single user type) and utility of the rubric (Dawson, 2017) within our institutional context. Our sample of raters for the external stage was limited based on opportunity and volunteer bias. Students were recruited from a physiology course, and the biology instructors who chose to participate in our study were the few who deliberately integrate graphing into their courses. Therefore, we cannot definitively claim that the rubric is universally applicable “as is” for each context, and we encourage users of the rubric to reflect on its appropriateness and utility for their specific use.
Implications for Instruction and Future Research
The graph rubric is a valuable, evidence-based assessment tool for biology instructors, students, and science education researchers, because it provides quick, systematic, and targeted evaluation of essential features of effective graphs. Frequent use of this rubric in the classroom not only communicates the learning objectives for data communication, but also the expectations of a well-constructed graph. The graph rubric has three different levels of achievement (present/appropriate, needs improvement, and unsatisfactory) and provides the instructor a transparent and objective means to evaluate student graphs. Given the diversity of graphs, their contexts, and personal or disciplinary preferences for data representations, the rubric can be used as an instructional catalyst. For example, an instructor can have students evaluate a set of graphs with the rubric and use the similarities and differences in scoring between the students and the instructor in a guided instruction. This activity would serve two important purposes. First, it would allow the instructor to communicate expectations for the attributes of high-quality graphs, and second, it would facilitate a classroom discussion about data and data representations. This discussion would provide the instructor the opportunity to provide guidance with and model reflective graph design and could include a comparison of the affordances and limitations of the form of data to plot (e.g., raw vs. summarized data) or appropriate graph types to use (e.g., categorical dot plot vs. bar graph) appropriate to the data type and purpose of the graph (see “Guide to Data Displays” in Angra and Gardner [2016]).
Current recommendations for undergraduate biology curricula include increasing students’ access to experiences that involve them in the practices of science, including designing investigations, grappling with data, and summarizing their findings (Auchincloss et al., 2014; Harsh and Schmitt-Harsh, 2016). These experiences will include work within lectures, laboratory courses, CUREs, or research apprenticeships. The graph rubric can be a valuable tool to guide students in creating effective data representations that will allow them to explore and summarize their data in a reflective manner. The consistent incorporation of the rubric into lab manuals for courses throughout a course sequence would be a valuable way to provide students with the repeated guidance and practice that is needed to aid in mastery of the skill of graphing.
As instructors are changing their instruction to respond to the recommendations for undergraduate biology curricula and funding agencies such as the National Science Foundation and Howard Hughes Medical Institute are committing monies to support the adoption of experiences such as CUREs, tools to guide and improve student learning are needed (Auchincloss et al., 2014; Corwin et al., 2015; Shortlidge and Brownell, 2016; National Academies of Sciences, Engineering, and Medicine, 2017). Until recently, much of the evaluation of student experiences in course-based or other research experiences was predominantly on student perceptions and attitudes. The graph rubric is a valuable addition to the growing list of research and evaluation tools of cognitive gains and learning (summarized in Shortlidge and Brownell, 2016). For example, the rubric can be used by education researchers and evaluators to monitor student learning and reveal persistent difficulties as students progress through a course, program, or curriculum. These data will be valuable in evaluating and refining instruction within student learning experiences.
While the graph rubric is a valuable, evidence-based tool for instruction and research, there are many opportunities for refining and expanding it. As noted, we did not include figure legends in the rubric. There is an opportunity to potentially include figure legends as part of the rubric or to design another tool to guide students in that part of their science writing and data presentation, as appropriate to the communication medium (e.g., lab reports). In addition, as part of the external stage of the design of the graph rubric, we chose to have a variety of people use the rubric to evaluate graphs in their areas of expertise. For students, these were graphs from the laboratory portion of a course they had taken, and for biology instructors, these were graphs generated by students in their classrooms. Exploring student use of the rubric with graphs in other course contexts or with graphs from the primary literature could reveal common areas of competence and difficulty regardless of graphing context or context-specific difficulties. This knowledge would provide valuable insight for research and instruction.
ACKNOWLEDGMENTS
Thanks to the members of the Purdue International Biology Education Research Group for thoughtful feedback on this work. We are indebted to the late Dr. Aaron Rogat for his tremendous insights, tenacious demand for clarity, and constant support of our work. Many thanks to Mikhail Melomed and Drs. Elizabeth Suazo-Flores and Maurina Aranda for feedback on the article. This work benefited from ideas initiated within the Biology Scholars Research Residency program (S.M.G.). The interpretation of this work benefited from the ACE-Bio Network (National Science Foundation RCN-UBE 1346567).