Fall 2019 Newsletter

Key Considerations in Rubric Design

Sunny Duerr discussed rubrics to faculty, staff, and guests at the Assessment Workshop in spring 2019.

Contributed by Sunny R. Duerr, Ph.D., Assistant Dean of Assessment and Accreditation, School of Education

The SUNY New Paltz Mission Statement states that “our goal is for students to gain knowledge, skills, and confidence to contribute as a productive members of their communities and professions and active citizens in a democratic nation and a global society.” However, when unmeasured, one can never know if a goal has been achieved. By actively engaging in the measurement of a goal’s outcome, we are able to not only determine whether we are being effective, but also identify opportunities for improvement along the way.

One of the most important steps in assessment is determining what instrument to use from among the many valid options that are available; one such option is a rubric. A well-constructed rubric can help align desired performance outcomes, content, and instruction, while also serving as a quick way to provide meaningful feedback to students. Further, using rubrics helps instructors identify what needs to be learned as opposed to what needs to be taught, and shifts the focus from specific tasks to learning outcomes.

What Are Rubrics?

A rubric is a scoring tool comprised of three areas: 1) criteria; 2) performance levels; and 3) performance indicators (see Figure 1 below). Each criterion in the rubric has clearly articulated language (performance indicators) describing what each performance level looks like. The performance indicators allow those using the rubric (often referred to as raters) to make performance-based evaluations with a high level of consistency, because each criterion’s performance is explicitly defined within the indicators.

 

Figure 1: Diagram of the components of a rubric 

 

Characteristics of Good Rubrics

A well-designed rubric begins with an articulation of desired learning outcomes. At the course level, these are course learning outcomes, but rubrics can be created based on program learning outcomes, or institution-level learning outcomes. The learning outcomes are the foundation for the rubric and help to identify what the criteria will be. Rubrics also serve as documentation of the alignment between learning outcomes, the assignment or task, the assessment, and applicable standards.

Well-designed rubrics measure observable behaviors. In other words, rubrics are useful for helping determine what students have done or have not done, but not what students feel or how they are motivated. It is common for a desired learning outcome to read “students will develop an appreciation for [statistics/art/history/Shakespeare/etc.],” but this would be difficult to measure with a rubric (or any other instrument) unless there is an assignment that directly provides the student with an opportunity to demonstrate that appreciation.

Well-designed rubrics will include performance indicators that are mutually exclusive, contain clear language, and measure only one criterion at a time. If the performance indicators are not mutually exclusive, raters are forced to use their subjectivity to decide between levels, leading to a loss of both inter- and intra-rater reliability. If performance indicators attempt to measure multiple criteria at once (called double-barreled performance indicators), the feedback students receive from the rubric becomes vague. Student feedback also suffers if the language in the performance indicator is simply a repetition of the name of the performance level (i.e., The student’s performance is at the unacceptable/ developing/acceptable/exceptional level). Clear language within each of the performance indicators is critical and is often one of the main challenges in rubric design.

And finally, well-constructed rubrics consist of criteria that are considered most essential. Often, faculty desire a rubric that can capture what might be most accurately expressed as “ALL THE THINGS;” this leads to rubrics that are excessively long and consequently, excessively cumbersome to use. But a well-designed rubric consists of only the criteria that are considered essential, acknowledging that this may mean discarding items that are less important. Identifying the most essential items is a relatively easy task using a framework outlined by Lawshe (1975) and reinforced by Ayre and Scally (2014); this process also has the added benefit of associating a measure of validity with the assessment.

Common Questions in Rubric Design

There are a few structural questions that need to be answered during the course of creating a rubric. How many performance levels should be used? What should the performance levels be labeled? How many criteria should be used? The answer to all of these questions is: “It depends.” As discussed above, the number of criteria should be limited to the most essential. In terms of how many performance levels and what these performance levels should be called, there are no hard-and-fast rules. There are, however, some guidelines.

When considering the number of performance levels, it’s more about what would be considered too few or too many. If too few performance levels are used, there is a risk of limiting the variability in responses, and the resulting data might not provide anything useful. In the case of too many performance levels, the risk is a loss of sufficiently delineated performance indicators because it becomes challenging to clearly articulate the difference between the middle section of the performance scale. Identifying low performance and high performance is easy, but teasing out multiple degrees of medium performance can be difficult.

Naming the performance levels is a matter of choice, but the level names should have some consistency, and they should build from low performance to high performance in a progressive sequence (or vice versa). Here are some examples:

  • Unacceptable, Developing, Acceptable, Exceptional
  • Unacceptable, Developing, Proficient, Mastery
  • Below Expectations, Approaching Expectations, Meets Expectations, Exceeds Expectations
  • Ineffective, Developing, Effective, Advanced

And there’s always the simple method: Level 1, Level 2, Level 3, Level 4.

Some Closing Thoughts

There are a few final things to ask people to think about when designing rubrics. First, what is the purpose of the rubric? Rubrics designed for specific assignments can look very different from rubrics designed to measure program learning outcomes, and the purpose of the assessment should be considered throughout the design process. Hand-in-hand with this consideration is the idea that less is more, and specifically, the fewer criteria included in your rubric, the more likely you are to get meaningful, reliable data from it. Some rubrics are designed to be comprehensive and consist of dozens of criteria (or more), but such long instruments are doomed to induce frustration, resentment, and disengagement in the people responsible for using the rubric. As such, aim for rubrics that consist of the fewest number of criteria possible (again, see Lawshe, 1975; Ayre & Scally, 2014).

Benchmarking is another key consideration for rubric design and implementation. What is the expected performance level for students, and does that change depending on how far along they are in their program? Does the rubric get used at multiple time points, allowing for the measurement of growth over time, and if so, are there different benchmarks for, say, sophomores, juniors, and graduating seniors? If the rubric is attached to a single assignment or task, such as a capstone experience or portfolio, what is the expected level of performance in order to consider the student to have “met the standard?” Also important, what happens if a student fails to do so?

Finally, a word about data analysis. The data from rubrics are often ordinal at best, and so treating the data as if they were interval/ratio data simply makes very little sense. Because of this, when summarizing assessment data from rubrics, report the data as a percentage of students performing at an acceptable level (determined by the benchmark and/or performance levels for the rubric), and not as a mean and standard deviation.

Having said all of this, it is important to understand that there are a number of different approaches to assessment and that rubrics are simply one method. There are also a number of different approaches to rubric design, and what is outlined here is simply some foundational information that has been found useful.

References

Lawshe, C.H. (1975). A quantitative approach to content validity. Personnel Psychology, 28, 563-575.

Ayre, C., & Scally, A.J. (2014). Critical values for Lawshe’s content validity ratio: Revisiting the original methods of calculation. Measurement and Evaluation in Counseling and Development, 47(1), 79-86.

For more information, please visit the Strategic Planning & Assessment website at https://www.newpaltz.edu/spa/assessmentresources.html for a pdf of Dr. Sunny Duerr’s PowerPoint “Designing Effective Rubrics.”