Continuous Improvement

Objectivity and Subjectivity in Evaluation

If we view objectivity and subjectivity of evaluation along a continuum, we can represent various assessment and scoring methods along its length.

Test items that can be evaluated objectively have one right answer (or one correct response pattern, in the case of more complex item formats). Scorers do not need to exercise judgment in marking responses correct or incorrect. They generally mark a test by following an answer key. In some cases, objective tests are scored by scanning machines and computers. Objective tests are often constructed with selected-response item formats, such as multiple-choice, matching, and true-false. An advantage to including selected-response items in objectively scored tests is that the range of possible answers is limited to the options provided by the test writer—the test taker cannot supply alternative, acceptable responses.

Because much of what we assess in reading and listening comprehension measures is first interpreted by the test writer, some degree of subjectivity is present in objectively scored items. For that reason, assessments of the Interpretive mode, even those comprised of "one-right-answer" items, might not be placed all the way at the objective end of the continuum.

Evaluating responses objectively can be more difficult with even the simplest of constructed-response item formats. An answer key may specify the correct answer for a one word, gap-filling item, but there may in fact be multiple, acceptable alternative responses to that item that the teacher or test developer did not anticipate. In classroom testing situations, teachers may perceive some responses as equally or partially correct, and apply some subjective judgment in refining their scoring criteria as they mark tests. Informal scoring criteria for short-answer items probably work well for classroom testing as long as they are applied consistently and are defensible.

Just as there may be few truly objective measures of second language knowledge and skill, so too is it rare to find purely subjective evaluations of performance. Allowing the subjective impressions of scorers to determine learners' grades would not be acceptable to most students, their parents, or other stakeholders. We do not usually have to justify our opinion that a work of art is good or bad—we simply like it or we don't. Since our judgment has no significant consequences for the artist (unless we are art critics), a subjective evaluation is acceptable. It is also not a matter of concern that the many viewers of the artwork do not agree about its quality.

Brown and Abeywickrama (2010) suggest that there are five cardinal criteria for judging assessments:

  1. Practicality: within budget; can be completed within an appropriate amount of time; considers time and effort for design and scoring
  2. Reliability: uniform rubrics that lend themselves to consistent application by various scorers; task items are unambiguous to the person completing the task
  3. Validity: measures what it proposes to measure; offers useful, meaningful information about the person’s abilities
  4. Authenticity: items/tasks are contextualized rather than isolated; meaningful, relevant, interesting topics; replicates real-world tasks
  5. Washback: positively influences how teachers teach and learners learn; gives learners feedback that enhances their language development.


CARLA Mailing List Signup Contact CARLA CARLA Events Donate to CARLA CARLA on Facebook CARLA on YouTube Twitter
Center for Advanced Research on Language Acquisition (CARLA) • 140 University International Center • 331 - 17th Ave SE • Minneapolis, MN 55414