Key Concepts Underlying Good Assessment and Measurement

To effectively carry out your evaluation priorities from the previous page, it's important to understand some key concepts: validity, reliability, practicality, and fairness. These concepts are important throughout the evaluation process. The activity and examples below will help you become familiar with them.

Learning Activity:

Four concepts in assessment and measurement that affect the entire evaluation process and that LPDs should be familiar with are validity, reliability, practicality and fairness. Can you match each term with its definition?

Table that matches assessment concept to definition. See linked document for accessible text. — Click for editable Google doc

To make these concepts more concrete, imagine that you want to assess students’ writing proficiency, which you define as “the ability to compose informal messages about familiar topics in the target language.” To assess students’ proficiency, you provide them with an example of a formal message about an unfamiliar topic and ask them to copy it verbatim. Note that to complete this assessment students don’t compose a message, nor is the message informal or about a familiar topic. As such, you cannot draw valid inferences about your students’ writing proficiency as you’ve defined it.

Now let’s say that you’ve replaced that assessment with a better choice: Your students will be composing a discussion board post on your course site about their classes, an informal message about a familiar topic. In your first class, you give students 15 minutes to draft, revise, and post their messages and allow them to ask you for help. In your second class, you’re short on time and only give them 5 minutes. When you grade their posts, you find that many students in the second class wrote less than the first class or nothing at all, so the class average was lower. That change in available time was a change to the conditions of the assessment process. This reduced the reliability of the assessment because factors beyond students’ writing proficiency affected their scores. In other words, there are more sources of error in the scores and therefore the assessment has lower reliability, which also negatively affects validity.

In terms of practicality, the discussion board post is quite good. Most instructors would have the time and language skills needed to score these short assignments. The discussion board is part of the course technology, so doesn’t require additional resources. The assignment is reusable in future semesters, so the time spent designing the assignment was a productive investment.

Turning to fairness, although the assignment requires few resources for students and instructors, we must always keep equity in mind. Students or instructors may not have devices or Internet access on demand, for example, so your program’s context is fundamental to determining fairness. Thinking carefully about how you use the posts is also essential; students are likely (and should be able) to assume that their instructor will not share their posts with anyone outside of the class. Using their posts outside of that context could potentially have negative consequences for students, for example, and would negatively affect fairness.

Each of these concepts has an impact on the design and implementation of program evaluation; they are addressed in many publications, such as Aryadoust & Raquel (2019), Bachman & Damböck (2018), Bannerjee (2016), Davis & McKay (2018), Norris (2016), and Wallace (2018).

< Previous Next >

The Center for Advanced Research on Language Acquisition (CARLA) is a Title VI Language Resource Center devoted to improving language teaching and learning. CARLA is a unit of the Global Programs and Strategy Alliance, the central international office for the University of Minnesota system.

Key Concepts Underlying Good Assessment and Measurement

Helpful Links