AAAS - Advancing Science, Serving Society Log In | Join | Search | Site Map | Contact
 Home  About AAAS  Programs  Membership  Publications  News  Career Support
 
 Search AAAS.org
 Advanced search
AAAS Programs
  Education Programs
 

Aligning Assessment to Content Standards: Applying the Project 2061 Analysis Procedure to Assessment Items in School Mathematics

George E. DeBoer, AAAS Project 2061
Paul Ache, Kutztown University of Pennsylvania

American Educational Research Association Annual Meeting
Montreal, Canada April 12, 2005

This study was designed to determine the effectiveness of a procedure to improve the alignment of mathematics assessment items to targeted state content standards.

The study was conducted on over 100 released items from a single state in the Northeast. The items were analyzed using a procedure developed by Project 2061 of AAAS. There were three broad criteria on which the items were analyzed:

  • Content Alignment: Is the knowledge specified in the content standard needed to answer correctly or can the correct answer be obtained in some other way? Is the knowledge specified in the content standard enough by itself to make a satisfactory response or is additional knowledge or skill needed as well?
  • Likely Effectiveness: (Referred to as “item efficiency” in the poster.) Is there anything in the item, which is not related to understanding the ideas in the targeted content standard, that might interfere with a student’s ability to respond correctly? Issues include comprehensibility, appropriateness of the task context, and “guessability.” The objective is to reduce the number of false negative and false positive answer choices.
  • Plausibility of Answer Choices: Are all answer choices plausible and related to the ideas being tested? For example, are distractors related to students’ misconceptions and commonly held beliefs?
    Teams of analysts produced written profiles that described each item’s alignment with the targeted content standard and provided suggestions for revision. Items were revised on the basis of the analysis criteria. It is important to note that the items were not revised on the basis of an examination of student responses on the original items.

Revised and original items were given to students who were asked to show their work, explain how they obtained their answer, and to indicate if anything about the item was confusing. Two forms of a test were created for each grade. Half of the items on each form were original and half were revised. Test forms were distributed randomly in each class. Data were analyzed to determine the impact that revisions had on improving the match between students’ answer choices and their written explanations. The study provided information about the effectiveness of this analysis procedure for improving the alignment of assessment items to content standards. Summary data presented here are for six items that were field tested with 259 eleventh grade students. A complete analysis of two items appears at the end of this paper.

Table 1: Comparing Results for Six Original and Revised Items

 

Provided
Explanations (%)

False Neg/Pos
(%)

Confused by
Wording (%)

Confused
Total (%)

Difficulty
(% correct)

Item

Orig.

Rev.

Orig.

Rev.

Orig.

Rev.

Orig.

Rev.

Orig.

Rev.

1

81.5

77.1

18.6

1.9

9.3

0.9

9.3

12.0

88.2

85.0

2

70.0

73.9

17.3

21.6

14.3

8.0

22.4

19.3

52.1

60.5

3

74.8

80.6

13.5

0.0

0.0

0.0

39.3

17.0

26.9

41.0

4

71.2

64.7

3.0

9.1

13.1

29.9

25.2

59.7

57.6

28.6

5

72.7

85.0

10.2

8.8

2.3

2.7

22.7

27.4

48.8

48.8

6

73.4

73.8

7.8

2.2

3.9

5.5

12.7

17.8

66.9

58.2

Mean

73.9

75.9

11.7

7.3

7.2

7.8

21.9

25.5

56.8

53.7

Conclusions

  • Whenever possible, the procedure should make use of student response data before items are revised. The purpose of the analysis procedure is to reduce the number of student responses that do not accurately reflect what they know and can do. Some of the factors that lead students to answer correctly when they do not have the required knowledge and incorrectly when they do, are not apparent until student responses are examined.
  • When students are asked to provide explanations for their answers or to show their work, approximately 75% of them do so. The comments that they make are helpful for determining if the answer they selected on a multiple choice test is consistent with their understanding as shown in their work and explanations.
  • When students are asked if anything in an assessment task is confusing to them, they answer in three ways: (1) they identify specific mathematics content that they do not understand; (2) they identify specific wording or aspects of the structure of the item that is confusing; and (3) they offer comments about being confused in general, without specifying what was confusing to them. Most answers to this question are about content confusion, although in a small but significant number of cases the students provide specific information about wording that is helpful when revising items.

Analysis of Item 1

Targeted Content Standard: Use operations (e.g. opposite, reciprocal, absolute value, raising to a power, finding roots, finding logarithms.)

Original Item 1

Which of the following represents the largest value?

  1. 103
  2. (5 + 5) X 10
  3. 108 / 102 (correct)
  4. 103 X 102

Analysts determined that the part of the content standard dealing with “raising to a power” was necessary to respond correctly to this task. Students had to use that operation to decide which of a set of numbers that used exponents was larger. Analysts also determined that the targeted standard was not sufficient (i.e., enough by itself) to respond correctly to this task because students also needed to know how to “compare quantities and magnitudes of numbers.” Although “comparing quantities” is a fifth grade content standard, analysts felt it was worth noting that this skill was needed and that it could not be assumed that all students would have mastered it. Analysts judged that the item was prone to guessing because the correct response is both the largest value and contains the largest exponent. It was felt that students might choose the correct answer even if they did not understand the ideas in the content standard because the correct answer contains the largest exponent.

Student Responses to the Original Item
One-hundred-nineteen responses were returned from the three schools participating in the pilot study. The table below shows the number and percent of students who showed their work or provided an explanation for their answer and the distribution of responses for the item.

Table 2: Student Responses to Original Item 1

 

A

B

*C

D

No Response

Total

Explanation

1

0

86

8

2

97

No Explanation

0

0

19

3

0

22

Total

1

0

105

11

2

119

Percent

.8

0.0

88.2

9.2

1.7

100.0

Analyzing the Responses of Students who Provided an Explanation for their Work: Determining False Negatives and False Positives for Original Item 1

Of the 97 students who showed their work or provided an explanation for their answer, 86 students responded correctly to this task and 11 responded incorrectly.

Students Who Chose Answer A. One student chose A. This student calculated the value of each answer choice correctly, showing that 1,000,000 was the largest value, but circled the wrong answer. This is a false negative, but not attributable to the structure of the item.

Students Who Chose Answer B. No student chose response B.

Students Who Chose Answer C (correct). Of the 86 students who responded correctly, seven showed significant errors in their work. Five of them made errors in how they evaluated the expression in C (the correct answer) and two of them made errors in how they evaluated expressions in the distractors. One student who chose the correct answer simplified 108 as 40, 102 as 20, and indicated that the quotient of those was 60. One student simplified 10/10 as 1, and then wrote “8-2=7.” Two students simplified the expression in C incorrectly but chose it as the correct answer because it was the largest value. The remaining student chose C because “it have to be the largest value…” The responses of the seven students who answered correctly but whose work showed that they did not understand the targeted content standard were judged to be false positives. However, the student work and explanations did not provide any evidence that they got the answer correct because of the way the item is structured. We were unable to discern if guessing played a role, i.e., if any of the students chose the correct answer C because it had the largest exponent.

Students Who Chose Answer D. The eight students who chose D evaluated the expressions in choices A, B, and C correctly, but they evaluated the expression in D incorrectly. Most of these students interpreted the expression 103 x (10)2 as (103 x 10)2, which would make it the largest value. These eight students indicated that the form of the expression in response D confused them. Because 103 x (10)2 is a non-standard form to many students, and because these eight students were able to correctly evaluate all of the other statements, we judged these eight wrong answers to be false negatives. This is something that could be changed in a future revision.

Students Who Chose no Answer. The two students who did not choose any answer evaluated all expressions correctly but did not circle an answer choice. These two responses were judged to be false negatives but not attributable to the structure of the item.

Summary of False Negatives and False Positives for Original Item 1. There were eighteen student responses that did not accurately indicate what the students did or did not know with respect to the targeted content standard (eleven false negatives and seven false positives). We attributed eight of the eleven false negatives to students being presented with a non-standard form of representing exponents in one of the answer choices. Because this represents additional knowledge to what is specified in the learning goal, this could also qualify as a sufficiency issue, although not one that was identified by the original analysts. It is certainly something that could be addressed in future revisions. Three of the eleven false negatives were due to students either circling the incorrect answer or circling no answer even though they had shown how to calculate the correct answer.

The seven false positives may have been due to student guessing as predicted by the analysts. In each case, the students showed that they did not know how to evaluate expressions containing exponents but still chose the correct answer. Perhaps they chose the answer that contained the largest exponent. However, no direct evidence was found in the student work or comments to confirm this possibility. In fact, only one student indicated that he or she chose correct answer C because “it had the largest numbers,” and even that does not speak directly to the size of the exponent.

Revised Item 1

Which of the following expressions represents the value 10,000?

  1. 104 - 100
  2. 102 + 102
  3. 108 / 102
  4. 104 X 100 (correct)

Results from Revised Item 1
One-hundred-forty responses were returned from the three schools participating in the pilot study. The table below shows the number and percent that showed their work or provided an explanation for their answer and the distribution of responses for the item.

Table 3: Student Responses to Revised Item 1

 

A

B

C

*D

No Response

Total

Explanation

8

5

1

91

3

108

No Explanation

2

0

2

28

0

32

Total

10

5

3

119

3

140

Percent

7.1

3.6

2.1

85.0

2.1

100.0

Analyzing the Responses of Students who Provided an Explanation for their Work: Determining False Negatives and False Positives for Revised Item 1

Of the 108 students who showed their work or provided an explanation for their answer, 91 students responded correctly to this task and 17 responded incorrectly.

Students Who Chose Answer A. Of the eight students who chose A, six noted that 100 = 0, and two students correctly simplified 104 as 10,000 without indicating a value for 100, suggesting that they also thought that 100 = 0.

Students Who Chose Answer B. Five students chose B, all of whom simplified the expression 102+102 as (100)(100) = 10,000.

Students Who Chose Answer C. The only student choosing C showed correct calculations for each expression, but chose the wrong answer. This is a false negative but not attributable to the structure of the item.

Students Who Chose Answer D (correct). Of the ninety-one students who chose the correct answer D, a variety of reasons were given for why they chose the answer. In no case was there enough evidence to suggest they did not understand the ideas in the content standard and, therefore, these responses were counted as valid responses.

Students Who Chose no Answer. Of the three students who did not circle an answer, two simplified 100 as 10 and said that there were no expressions equal to 10,000. The third student calculated each response correctly, but did not circle an answer. This student’s answer was counted as a false negative but not one that can be addressed by changing the item.

Summary of False Negatives and False Positives for Revised Item 1. Two student answer choices did not accurately represent what they did or did not know with respect to the targeted content standard (two false negatives and no false positives). These two answer choices were either due to circling the wrong answer or circling no answer even though the work was correct.

Comparing Original and Revised Item 1

Do the Revisions Increase the Validity of Student Selections? There were fewer false negatives and false positives for the revised item than for the original item. This can be attributed to the elimination of the non-standard form of the exponent in the original item answer choice D and, possibly, the removal of the largest exponent in the correct answer choice C, which may have reduced guessing.

Table 4: False Negatives and False Positives for Original and Revised Item 1

 

N

False Positives

False Negatives

Total

Percent Invalid

Original

97

7

11

18

18.6

Revised

108

0

2

2

1.9

Is the revised item less confusing to students? Results of asking students if anything about the item is confusing are organized into three categories: (1) content confusion, (2) confusion about the way the item is worded or structured, and (3) non-specific confusion. Table 5 summarizes the data for the students who found the item to be confusing for the original and revised items.

Table 5: Comparing Degree of Confusion for Original and Revised Item 1

 

Total N

Content Confusion

Confusion About Item Structure or Wording

Non-Specific Confusion

Percent Confused

Original

97

0

9

0

9.3

Revised

108

11

1

1

12.0

On the original item, nine students said something was confusing to them. All nine students indicated that answer choice D confused them because they did not know how to simplify the given expression (wording or item structure), which is usually written as 102 not as (10)2. On the revised item, thirteen students indicated that something was confusing to them. Nine of them said that 100 was confusing (content), one student did not understand the term “represent” (wording), two students indicated not knowing how to cope with the superscript (content), and one student claimed “the hole thing” to be confusing (non-specific).

Item Difficulty. For all students who took the test, both those who explained their answer and those who didn’t, on the original item 88.2% of the students answered correctly and on the revised item 85.0% of the students answered correctly. Changes in the item had very little impact on item difficulty.

Analysis of Item 2

Targeted Content Standard: Apply ratio and proportion to mathematical problem situations involving distance, rate, and similar triangles.

Original Item 2

Kim needs a certain shade of pink paint for a handmade toy. This shade is made by mixing white and red paint in a ratio of 1 to 3. How many fluid ounces of red paint would be needed to make 12 fluid ounces of this pink paint?

  1. 4 fluid ounces
  2. 6 fluid ounces
  3. 8 fluid ounces
  4. 9 fluid ounces (correct)

Analysts determined that the knowledge and skills specified in the target content standard are both necessary and sufficient to respond correctly to this task. Even though this item does not involve the application of ratio and proportion to problems involving “distance, rate, and similar triangles,” analysts felt that the contexts specified in the content standard were meant to be illustrative and that contexts like the one in this assessment task fell within the scope of the content standard. Analysts did, however, note a lack of clarity in the way the problem was originally written. In particular, they felt that the first sentence does not contain much useful information, the question is written in the passive voice, and when the question is asked in the third sentence, it does not refer back to Kim.

Based on the comments of the analysts, it was decided that the item would be revised solely on likely effectiveness issues. Therefore, the item was revised to provide students with additional information about the problem context in the first sentence. No attempt was made, however, to change from passive to active voice or to refer back to Kim in the second and third sentences. Although not done intentionally, the task was also changed from one requiring the ability to deal with part-to-whole comparisons to one requiring the ability to deal only with part-to-part comparisons.

Student Responses to Original Item 2
One-hundred-forty responses were returned from the three schools participating in the pilot study. The table below shows the number and percent of students who showed their work or provided an explanation for their answer and the distribution of responses for the item.

Table 6: Student Responses to Original Item 2

 

A

B

C

*D

No Response

Total

Explanation

36

1

0

59

2

98

No Explanation

16

1

3

14

8

42

Total

52

2

3

73

10

140

Percent

37.1

1.4

2.1

52.1

7.1

100.0

Analyzing the Responses of Students who Provided an Explanation for their Work: Determining False Negatives and False Positives for Original Item 2

Of the 98 students who showed their work or provided an explanation for their answer, 59 responded correctly and 39 responded incorrectly.

Students Who Chose Answer A. Of thirty-six students who chose answer A, 22 divided 12 by 3 to get the answer 4. The remaining 14 students correctly used equivalent fractions but reversed the order of red and white paint. These students demonstrated an understanding of proportions as being two equivalent ratios, but they were confused because the question stem does not specify whether one part is white and three parts are red or if three parts are white and one part is red. It merely says: “white paint and red paint in a ratio of 1 to 3.” It must be inferred that the order of the colors of paint is the same as the order in which they are presented in the problem. Some students explicitly mentioned this as being confusing, but others may have mistakenly reversed the order without knowing it. Based on the evidence that we have, we concluded that it is likely that these students have an understanding of the target content standard and got the question wrong because they were not sure of the labeling that accompanied the given ratio. These fourteen responses were considered to be false negatives that were due to an aspect of the task that could be corrected in future revisions.

Students Who Chose Answer B. One student chose answer B. This student began by correctly showing the equivalence of 1/3 and 2/6, but stopped at that point and chose 6 as the answer.

Students Who Chose Answer C. No students chose C.

Students Who Chose Answer D (correct response). Of the 59 students who chose D, 51 showed a clear understanding of the ideas of ratio and proportion that are needed to solve the problem. They showed that they understood that the expression 1:3 indicates the relation between the parts (white and red paint), that they needed to add 1 and 3 to get the whole (the pink paint); and that they had to find the equivalent ratio 3:9 that added to 12. Students did this in a number of ways, but in each case their understanding of the ideas in the content standard was evident.

Eight students used strategies that suggested that they did not understand the required ideas or they did not provide enough information to tell if they understood the ideas in the content standard or not. For example, five students claimed to use “logic,” or “did it in my head.” Three students got the correct answer, but mentioned not understanding ratios. One of these students subtracted 3 from 12 to get the correct answer D. The other two students added 9 and 3 and then circled the correct answer D. Based on what was said by these students, it appears that even though they chose the correct response, they did not understand the ideas in the targeted content standard. The three students who found the answer, 9, as the difference between 12 and 3 and said they did not understand ratios were classified as false positives. The five who said they did it in their heads or used logic did not give us enough information to classify them as false positives.

Students Who Chose No Answer. Two students did not circle any answer, and their work showed they did not understand the ideas being tested.

Summary of False Negatives and False Positives for Original Item 2. On the original item there were 14 false negatives that are attributable to the imprecise wording of the task, specifically the imprecision about the order of paints in the ratio. There were three false positives that are attributable to the fact that the numbers 12 and 3 appear in the stem and the number 9 appears in the correct answer choice. Three students found the correct answer by finding the difference between 12 and 3. This could be corrected in a revision by using numbers in the stem that do not allow for a solution by addition or subtraction. (For example, the question could say that pink paint is made by mixing white and red paint in the ratio of 1 to 4, and then ask how many ounces of red paint are needed to make 15 ounces of pink paint. The numbers 15 and 4 would appear in the stem, and the correct answer would be 12.)

Revised Item 2

Kim is painting a handmade toy and she needs to mix paint so she can create a certain shade of pink. This shade is made by mixing white paint and red paint in a ratio of 2 ounces to 5 ounces. How many ounces of red paint are needed to mix with 100 ounces of white paint to create the right shade of pink?

  1. A. 500 ounces
  2. B. 250 ounces (correct)
  3. C. 40 ounces
  4. D. 20 ounces

Results from Revised Item 2
One hundred nineteen responses were returned for the revised item from the three schools participating in the pilot test. The table below shows the number and percent of students who showed their work or provided an explanation for their answer and the distribution of responses for the item.

Table 7: Student Responses to Revised Item 2

 

A

B

C

*D

No Response

Total

Explanation

2

60

19

1

6

88

No Explanation

3

12

8

4

4

31

Total

5

72

27

5

10

119

Percent

4.2

60.5

22.7

4.2

8.4

100.0

Analyzing the Responses of Students who Provided an Explanation for their Work: Determining False Negatives and False Positives for Revised Item 2

Of the 88 students who showed their work or provided an explanation for their answer, 60 responded correctly and 28 responded incorrectly.

Students Who Chose Answer A. Two students chose answer A. One of the students who chose A simply multiplied 100 by 5. The other student multiplied 250 by 2. Both of these students demonstrated that they did not understand the ideas in the content standard.

Students Who Chose Answer B (Correct). Sixty students chose the correct answer B. Fifty-eight of them demonstrated appropriate use of proportional reasoning to solve the problem. One additional student demonstrated an understanding of proportions by arguing that answer C (20 ounces of red paint) and answer D (40 ounces of red paint) are not enough and if it was 500 ounces then there would be 200 ounces of white paint. The remaining student wrote a series of ratios, beginning with “4 to 10, 6 to 15, 8 to 20 …” and continued to list ratios using this pattern. The final ratio was written as 92 to 250 instead of 100 to 250. This error did not provide sufficient evidence that the student did not understand the targeted content standard and, therefore, was not listed as a false positive. Even if it were judged to be a false positive, it is not something that could be affected by revising the item.

Students Who Chose Answer C. Nineteen students chose answer choice C. Of these, 16 students showed they knew the correct way to solve the proportion, but as in the original item they reversed the order of the red and white paint, which led to an incorrect response. The remaining three students who chose C provided an incorrect explanation for their work. For the 16 responses where the order of red and white paint was reversed, it was judged that these students understood the ideas needed to answer correctly even though they got the wrong answer. These 16 were considered to be false negatives.

Students Who Chose Answer D. One student chose answer choice D. This student divided 100 by 5, indicating that the student did not understand the ideas needed to answer correctly.

Students Who Chose No Answer or More than One Answer. Two students circled two answers. One of these students was apparently not sure of the order of the paint in the ratio because the student listed the correct proportions for both answer B and C, solved them both, and circled both responses. As with the students who chose C, this response was counted as a false negative. The other student circled answers A and C, wrote a series of fractions, and then claimed to not know how to solve ratios. One student tried unsuccessfully to write an equation. Two students provided correct work but did not circle any of the response options. Although these were false negatives, they were not due to anything that could be corrected in the structure of the item.

Summary of False Negatives and False Positives for Revised Item 2. On the revised item there were seventeen false negatives that were attributable to the imprecise wording of the task, specifically the imprecision in the order of paints in the ratio. There were also two additional false negatives due to students not circling an answer. There were no false positives on the revised item.

Comparing Original and Revised Item 2

Do the Revisions Increase the Validity of Conclusions about what Students Know and do not Know? There were approximately the same percentage of false negatives and false positives on the original and revised items. These were due mostly to the imprecision in how the order of the paints was stated in the questions. Although this was not an issue that was addressed in the revision, it is clear that if it had been, the number of invalid responses could have been reduced significantly. The three false positives in the original item were eliminated by changes that were made in the revised item. Two of the false negatives on the revised items were due to students not circling an answer even though they showed they understood the ideas being tested.

Table 8: False Negatives and False Positives for Original and Revised Item 2

 

N

False Positives

False Negatives

Total

Percent Invalid

Original

98

3

14

17

17.3

Revised

88

0

19

19

21.6

Is the revised item less confusing to students? Results of asking students if anything about the item is confusing are organized into three categories: (1) content confusion, (2) confusion about the way the item is worded or structured, and (3) non-specific confusion. Table 9 summarizes the data for the students who found the item to be confusing for the original and revised items.

Table 9: Comparing Degree of Confusion for Original and Revised Item 2

 

Total N

Content Confusion

Confusion About Item Structure or Wording

Non-Specific Confusion

Percent Confused

Original

98

8

14

0

22.4

Revised

88

0

7

10

19.3

Of the 22 students who claimed to be confused by some aspect of the original item, eight students mentioned content issues, seven of them claiming to not understand how to do ratio problems (content) and one student not knowing how to divide the 12 into parts (content). Fourteen students mentioned specific issues with the wording or structure of the item, such as not knowing the order of paints in the ratio, either red to white or white to red (wording). On the revised item, responses from the 17 students who claimed to be confused fell into two categories. Seven students claimed to be confused because they did not know the order of the paints in the ratio (wording). The remaining 10 students claimed to not understand how to do the problem (non-specific) but offered no reason.

Item Difficulty. For all students who took the test, both those who explained their answer and those who did not, 52.1% of the students answered correctly on the original item and 60.5% of the students answered correctly on the revised item. The greater number of correct responses on the revised item is most likely due to the fact that the revised item requires fewer steps because it eliminates the need to calculate the “whole” from the two parts. The given ratio in both tasks compares part-to-part (red-to-white), but in the original item students are required to take an additional step: students needed to add the red part to the white part to get the whole before answering this question. In the revised item, students simply have to identify an equivalent part-to-part ratio.


The research reported here was supported by the National Science Foundation (NSF Grant #9819018). Any opinions, findings, and conclusions or recommendations expressed here are those of the authors and do not necessarily reflect those of NSF.

Bottom navigation
Mission | History | Organization | Fellowships | Annual Meeting | Affiliates | Awards | Giving
Education | Science & Policy | International Office | Centers
Join | Renew | Benefits | Member Sections | Membership Categories | Log in
Science Online | Newsletters | SB&F | Annual Report
Press Room | Events | Media Contacts | News Archives
Science Careers | Next Wave | Fellowships | Internships | Employment at AAAS
 Project 2061
About Us
R&D Areas
Publications
Conferences & Workshops
Affiliated Web Sites

Need help finding things?

Print this page
E-mail this page
Project 2061 RSS Feed
Sign up for newsletters

Translations:
Proyecto 2061 en español