Mathematics Curriculum Materials Analysis Reliability Study
Gerald Kulm
Laura Grier |
AAAS -- Project 2061
May 6, 1998 |
Project 2061 is developing two initiatives and products for reviewing and reporting
on the analysis of mathematics and science curriculum materials: (1) a print
and CD-ROM tool, Resources for Science Literacy: Curriculum Materials Evaluation,
and (2) a data base of reports on middle grades mathematics and science textbooks.
For users to have confidence in the analysis procedure and in the published
textbook reviews, the analysis must be reliable. That is, the procedure for
producing reports must be one in which independent reviewers can come to similar
judgments for similar reasons.
The Project 2061 procedure for analyzing mathematics curriculum materials
and the support documents that clarify the procedure have been revised extensively
based on feedback from the numerous educators who have used it. In addition,
we have developed a list of indicators and a rating scheme to make the procedure
more useful for producing ratings and reports. This study was carried out
to determine the effectiveness of the procedure arid training materials in
producing consistent, valid, and reliable ratings of middle grades mathematics
textbooks.
Method
Raters: To help us test the reliability and validity of the
revised procedure, we convened twelve of our most highly able analysts for
a rater reliability study. Each of the raters had been trained to analyze
middle grades mathematics materials. Some of the raters had been trained by
Gerald Kulm, Project 2061, as part of a project directed by Bill Bush at the
University of Kentucky. Others had been trained by Kulm as part of the Expert
Panel effort of the U. S. Department of Education. This training was based
on the Project 2061 procedure. Both of these experiences showed that the most
reliable ratings are produced when analysts work in teams of two persons.
In addition, the most valid ratings were produced by a team consisting of
a middle grades mathematics teacher and a university mathematics or mathematics
education faculty member or supervisor.
The analysts for the study were chosen with these factors in mind. Eight experienced
mathematics teachers and six university mathematics education faculty were
selected. The analysts received a $1500 consulting fee as well as expenses
for (1) attending the training meeting and (2) submitting a satisfactorily
completed report. The names and positions of the analysts are provided in
Table 1.
| Table 1. Mathematics Curriculum Analysts |
 |
Diane Surati
Mathematics Teacher
Montpelier, VT |
Bill Kunnecke
Mathematics Teacher
Calvert City, KY |
Mark Deegan
Mathematics Teacher
Alexandria, VA |
Michele Crowley
Mathematics Education Instructor
Northern Kentucky University |
Kathleen Morris
Mathematics Teacher
Lorton, VA |
Sue P. Reehm
Mathematics Education Professor
Eastern Kentucky University |
Linda Hackett
Mathematics Education Professor
American University |
Peg Darcy
Mathematics Teacher
Louisville, KY |
Marshall Gordon
Mathematics Teacher
Columbia, MD |
Jan McDowell
Mathematics Teacher
Louisville, KY |
Alice Mikovch
Mathematics Education Professor
Western Kentucky University |
Faye Stevens
Mathematics Teacher
Cadiz, KY |
 |
Training: The raters attended a three-day meeting in Washington,
DC to become familiar with the revised procedure and to practice the rating
criteria. Using a mathematics benchmark and a sixth-grade textbook, the training
consisted of the following steps:
- After clarifying the benchmark, the participants identified sightings
in the textbook. The sightings were discussed, then used for the remainder
of the training session.
- For each Instructional Cluster criterion, analysts were guided in a discussion
of the cluster, the criterion, and the rating indicators.
- Working in teams, the indicators were used to make a rating of each criterion.
The ratings for the six teams were displayed, and the discrepancies were
discussed as a way to strengthen understanding of the criteria, indicators,
and rating procedure.
Following the meeting, the indicators and rating criteria that were unclear
or inconsistent were modified to produce a final set of instructions and a
rating form. The instructions, along with a full set of examples from textbooks,
illustrated a range of materials rated from low to high on how well they addressed
each criterion. This notebook was available to the analysts as they studied
the procedure and when they returned home to do their own analysis and ratings.
Design: Following the training, two sets of middle grades
mathematics materials were sent to the analysts: Transitions Mathematics
and Connected Mathematics. For the latter material, two units from
each of three mathematics strands were selected for rating. Each team rated
two mathematics benchmarks, one conceptual and one skill, for each of the
two sets of curriculum materials. Each of the three pairs of teams was assigned
to one of three mathematics strands: Number, Geometry, and Algebra. The two
teams for each strand rated the same benchmarks and the same materials independently.
Table 2 summarizes the mathematics strands, materials, analysts, and benchmarks
that were used in the study.
Rating: The analysis and rating was done during March, 1998.
Team members were encouraged to consult with each other and to ask questions
of the director. They were asked not to consult or communicate with the members
of other teams, especially the team that was analyzing the same set of materials.
Teams submitted reports that included the (1) sightings for each indicator,
(2) the justifications for the sightings, (3) the rating [Met, Not Met, Unsure]
of each indicator, (3) the overall rating of each criterion [High, Medium,
Low, None], and a justification of the overall rating of each criterion. In
all, 24 criteria across 7 instructional clusters were rated.
With the exception of two or three analysts, the ratings were completed within
the month, with the remainder being completed within two more weeks. All of
the reports were complete and useable in the study.
| Table 2. Design of Reliability Study |
 |
| Strand |
Analyst Teams |
Materials |
Benchmarks |
 |
| Number |
Diane Surati
Bill Kunnecke
Mark Deegan
Michele Crowley |
Connected Mathematics:
Bits And Pieces I Connected Mathematics:
Comparing And Scaling Transition Mathematics |
Concept
9A 6-8#5 The expression a/b can mean different things: a parts
of size 1/b each, a divided by b, or a compared to b.
Skill
12B 6-8#2 Use, interpret, and compare numbers in several equivalent
forms such as integers, fractions, decimals, and percents. |
 |
| Geometry |
Kathleen Morris
Sue Reehm
Linda Hackett
Peg Darcy |
Connected Mathematics:
Stretching And Shrinking
Connected Mathematics:
Looking For Pythagoras
Transition Mathematics |
Concept
9C 6-8#l Some shapes have special properties: Triangular shapes tend
to make structures rigid, and round shapes give the least possible
boundary for a given amount of interior area. Shapes can match exactly
or have the same shape in different sizes.
Skill
12B 6-8#3 Calculate the circumference and areas of rectangles, triangles,
and circles, and the volumes of rectangular solids |
 |
| Algebra |
Marshall Gordon
Jan McDowell
Alice Mikovch
Faye Stevens |
Connected Mathematics:
Variables And Patterns
Connected Mathematics:
Thinking With Mathematical Models
Transition Mathematics |
Concept
9B 6-8#3 Graphs can show a variety of relationships between two variables.
As one variable increases uniformly, the other may do one of the following:
increase or decrease steadily, increase or decrease faster and faster,
get closer and closer to some limiting value, reach some intermediate
maximum or minimum, alternately increase and decrease indefinitely,
increase or decrease in steps, or do something different from any
of these.
Skill
11C 6-8#4 Symbolic equations can be used to summarize how the quantity
of something changes over time or in response to other changes. |
 |
Summary of Results
There are 24 criteria across the seven instructional clusters. Overall, six
separate ratings were done for each of these criteria on each of the two materials,
resulting in 288 ratings. The results are summarized in Table 3.
There were 34 disagreements that differed by more than one step on the 4-point
[High, Medium, Low, None] rating scale. Therefore, the overall percentage
agreement was 88.2 percent. The percentage agreement on the two materials
differed considerably. For Transition Mathematics, there were 29 out
of 144 differences, resulting in a rater agreement of 79.9 percent. For Connected
Mathematics, there were 5 out of 144 differences, which is a 96.6 percent
rater agreement. A closer look at the individual benchmarks shows that 14
of the 34 rating differences were on concept-related benchmarks. This result
is due primarily to the Transition Mathematics data, indicating that in this
material, skill-related benchmarks are more difficult to rate.
As shown in Table 3, some of the criteria appeared more difficult to rate,
regardless of the benchmark or type of material. For example, there were difficulties
in rating criteria 4.4 Connecting Ideas and 7.1 Teacher Content Learning for
both materials. For Transition Mathematics, there were three rating
differences on criterion 4.1 Building a Case. Cluster 4 Developing and Using
Mathematical Ideas resulting in the greatest number of rater differences for
Transition Mathematics.
| Table 3. Summary
of Rater Agreements and Differences in Ratings on Benchmarks |
| Transition Mathematics |
| Benchmarks |
Rater agreement (%) |
Criteria with disagreements > 1 |
| 9A#5 |
83 |
4.1
5.2
6.2
7.1 |
| 9C#l |
96 |
4.2 |
| 9B#3 |
75 |
4.1, 4.2, 4.4, 4.5
5.1
7.1 |
| 12B#2 |
63 |
2.1
4.1,4.3,4.5
5.2,5.3
6.2
7.1, 72 |
| 12B#3 |
100 |
|
| 11C#4 |
63 |
1.3
2.1, 2.4
3.2
4.1, 4.3, 4.4
5.1
7.1 |
| Connected Mathematics |
| Benchmarks |
Rater agreement (%) |
Criteria with disagreements > 1 |
| 9A#5 |
100 |
|
| 9C#1 |
100 |
|
| 9B#3 |
88 |
2.2
4.4
7.1 |
| 12B#2 |
100 |
|
| 12B#3 |
100 |
|
| 11C#4 |
92 |
2.2
4.4 |
Kulm, G., Grier, L. 1998. Mathematics Curriculum Materials Analysis
Reliability Study.