| |
 |
Linking Middle and Early High School Science and Mathematics Assessment
Items to Local, State, and National Content Standards
A Proposal Submitted to the Division of Elementary, Secondary, and
Informal Education under the IMD Assessment Program by the American
Association for the Advancement of Science Project 2061 August 2003
Project Description
The purpose of this five-year project is to develop a bank of high-quality
assessment items and related tools in middle- and early high-school
science and mathematics that are aligned with state and national
content standards; that are easily accessible by users; and that
can be utilized throughout the educational system by curriculum researchers,
curriculum developers, teachers, test developers, and the general
public. The items will also be valuable as demonstration models of
what aligned assessment looks like, models that can be used in college
and university teacher education programs.
The requirements of the new federal No Child Left Behind Act of
2001 have given high-quality assessment new importance. By mandating
tests that are based on state standards, the legislation provides
the impetus to design assessment tasks that measure understanding
of the content specified in those standards.
As one of the first organizations to focus on content standards
and their role in curriculum, instruction, and assessment, Project
2061 of the American Association for the Advancement of Science (AAAS)
has been studying the alignment and effectiveness of hundreds of
test items drawn from a variety of sources, including items from
the Third International Mathematics and Science Study (TIMSS) and
National Assessment of Educational Progress (NAEP) tests, items from
state tests, released state test items, and items from various curriculum
materials. Using specially designed criteria, Project 2061 and teams
of experienced educators and assessment specialists have developed
a procedure for analyzing and profiling items for their alignment
with content standards and for other characteristics that affect
the usefulness of the items to measure student understanding of those
content standards (American Association for the Advancement of Science
[AAAS], 2003). Application of the procedure leads to a detailed analysis
of each item on such features as content alignment; comprehensibility;
test wiseness; bias related to gender, class, race, and ethnicity;
and item context. The results of the analysis can then be used as
the basis for revising those items. In the project proposed here,
we will extend our ongoing work with assessment items to further
develop this analysis procedure by examining assessment items for
additional linguistic features and for their suitability for testing
students with limited English proficiency. This new work will draw
on research being done by Rebecca Kopriva at the University of Maryland
(Kopriva, 2000).
Our current work also involves developing assessment maps that
can be used as conceptual frameworks for creating multi-item tests
that measure student understanding of targeted content standards
and related ideas. The assessment maps identify common misconceptions,
prerequisite ideas, and ideas that come later in the developmental
progression. These maps draw from the strand maps that we have developed
for the Atlas of Science Literacy (AAAS, 2001) and from
the work on progress variables in learning (Wilson & Draney,
1997). Tests built around assessment maps can be used to provide
a diagnostic analysis of student understanding of ideas identified
in the content standards.
Because the new federal legislation requires states to hold students
accountable for the specific content standards of each state,
being able to cross-link standards documents is essential in
order to provide national resources to the states and to share resources from
state to state. To accomplish this cross-linking, our proposed
project will draw on existing efforts to create connections between
the content standards of each state and to national content
standards. The work done by Mid-continent Research for Education and Learning (McREL)
and Align-to-Achieve,
for example, allows one to match the content standards of approximately
40 states, the National Science Education Standards (NRC,
1996), and their own Compendix (http://www.mcrel.org/standards-benchmarks),
a set of benchmarks and standards drawn from primary national
documents such as AAAS's Benchmarks for Science Literacy (1993)
and the National Research Council's National Science Education
Standards, as well as various state documents. (AAAS's Benchmarks was
used extensively in the creation of the Compendix and will
soon become part of a complete set of content standards in the
Align-to-Achieve database.) The proposed project will work with the linked standards
already included in the Align-to-Achieve database to create
a utility that will allow users to access test items matched to national content
standards or to the content standards of any state.
The Need for Standards-Based Assessment Items and Tools
Alignment of all elements of the education system to content standards
is at the heart of the standards-based reform movement that has taken
hold over the past dozen years or so. Standards-based reform of K-12
education is founded on the premise that fundamental improvement
begins with (and continues to be tied to) a coherent, well-articulated
set of specific content standards. This vision for the reform of
education in science and mathematics has been promoted for over a
decade through AAAS Project 2061 and its Science for All Americans (1989)
and Benchmarks for Science Literacy (1993), through the
National Council of Teachers of Mathematics (NCTM) and its Curriculum
and Evaluation Standards for School Mathematics (1989) and Principles
and Standards for School Mathematics (2000), and through the
National Research Council (NRC) and its National Science Education
Standards (1996).
For the standards-based reform agenda in science and mathematics
education to continue to move forward, the field needs access to
high-quality assessment items that are aligned to the content standards
specified in national and state standards documents. As stated in
the National Science Education Standards: "…assessment
is a primary feedback mechanism [that]…leads to changes in
the science education system by stimulating changes in policy, guiding
teacher professional development, and encouraging students to improve
their understanding…." (NRC, 1996, p. 76).
Ultimately, the quality of any test comes down to the specific
tasks that students are asked to perform. Obviously, one item, or
even a single set of items, can never give us complete confidence
that students understand or do not understand an idea, but every
item should contribute some evidence of student understanding. At
present, however, there is general awareness in the field that there
are too many poorly written assessment items that do not align properly
with the content standards for which students are being held responsible.
One reason for this lack of alignment between content standards and
assessment items, which we will discuss later in this proposal, is
that it is not always clear exactly what the content standards themselves
are saying about the ideas and skills students are expected to learn
and what they should be able to do on a test.
At the present time, released items are available from such sources
as state tests and tests administered by NAEP and TIMSS. There is
also a large item bank developed by the Council of Chief State School
Officers (CCSSO) as part of their State Collaborative on Assessment
and Student Standards project (SCASS), which is available to participating
states. However, in general, these items are linked to content standards
at a broader level of specificity than we are proposing for this
project, as is evident, for example, on the CCSSO Web site (http://www.ccsso.org/projects/Alignment_Analysis).
None of the alignment models listed on that Web site, including
those of Achieve (n.d.), Webb (1999), the Council on Basic Education
(n.d.), and the CCSSO Survey of Enacted Curriculum (SEC) (n.d.),
provides the level of precision in their alignment procedures that
we propose. The SEC model, for example, examines alignment at the
level of topics such as multiple-step equations, inequalities, linear
equations, etc. (Porter, 2002). Whereas these alignment models are
an important first step in moving instruction, materials, and assessment
toward alignment, without test items that assess the specific ideas
and skills in the content standards, teaching and learning will still
lack the precision that is called for in a standards-based environment.
Standards-based teaching and learning requires accuracy in measuring
student progress toward the attainment of those content standards.
If we are not committed to creating and using test items that actually
assess the specific ideas and skills identified in widely accepted
content standards, then the purpose of assessment and of the standards
themselves are unclear, and standards-based reform is jeopardized.
The findings from Project 2061's current NSF-funded assessment
study described earlier (ESI-9819018) provide a useful framework
for the new work that is proposed here. Project 2061's ongoing efforts
related to assessment and to curriculum materials research enable
us to identify some of the most urgent needs that the proposed work
will address. Key areas where the proposed products and tools are
needed include:
Curriculum materials research. Assessment items
that are aligned with content standards but not specific to any single
materials development project will enable researchers to compare
the effectiveness of various instructional materials objectively.
Curriculum researchers need assessment items that policy makers and
the public regard as fair measures of student knowledge. Without
credible evidence that new and innovative materials can help students
learn, stakeholders may decide that the benefits of implementing
such materials do not justify the costs.
Researchers also need high-quality assessment items linked to content
standards to test such things as the comparative effectiveness of
instructional sequences, the viability of particular visual representations
of abstract concepts, and the value of using certain phenomena and
real world examples to make ideas concrete and understandable to
students. Broad stroke evaluation of the effectiveness of curriculum
materials is not enough. As a result of the work of our new Center
for Curriculum Materials in Science (ESI-0224186), we recognize that
items that are aligned to content standards are essential for conducting
rigorous, fine-grained research on materials as they are being developed.
Existing assessment items are not focused enough on specific content
standards to be used for these purposes. Without assessment tasks
that provide precise measures of student understanding of the specific
ideas and skills addressed in the curriculum material, it is impossible
to conduct rigorous research studies with replicable results.
Materials-embedded assessment. High-quality assessment
items linked to content standards should also be integrated into
instructional materials themselves. Through our evaluations of curriculum
materials and through our role as consultants on curriculum development
projects, we have learned that developers do not consistently include
high-quality assessment items in their materials. Even when assessment
items are present in the materials, they are not deployed strategically
so that teachers can gauge students' understanding of the ideas or
skills being taught and modify their instruction accordingly. Instructional
materials generally include questions, but they are not explicitly
linked to specific content standards and are not included as a way
to give teachers feedback on how they can improve their instruction
based on how students respond to those questions. Most assessment
items seem intended simply to provide students with additional practice
and to document their success or failure rather than to guide instruction
(AAAS, 2003).
Classroom assessment. Teachers need high-quality
assessment items that are linked to the content standards that their
students are being held accountable for on local and state tests.
For teachers to take a standards-based approach seriously and think
in terms of moving their students toward the attainment of specific
content standards, they need assessment resources that are aligned
with those content standards. It is one thing to say that students
should know, for example, that: "an unbalanced force acting on an
object changes its speed or direction of motion, or both," (AAAS,
1993, p. 90), but without assessment items and other resources that
focus directly on that content standard, it will be difficult for
teachers to find out exactly what their students know or do not know
about the ideas specified in that standard. Assessment items should
offer a way to test student understanding of key ideas independent
of the instructional contexts used by a particular teacher or textbook
to ensure that students can demonstrate an understanding of important
ideas that goes beyond merely parroting back the words they hear
in class. Items should enable teachers to interpret students' thinking
about the ideas and skills targeted in content standards and provide
diagnostic information on what may be impeding student learning.
Assessment items that are aligned to content standards also enable
teachers to keep track of their students' understanding of selected
ideas over time and to conduct classroom research on the effects
of various instructional strategies on learning.
Large-scale assessment. Most states have adopted
content standards in science and mathematics, and many have moved
toward state-wide testing in these subjects. But according to the
National Science Foundation's Science & Engineering Indicators
2002 report, there are persistent concerns over "the degree
to which state tests align with state standards." The report goes
on to identify several groups-from the American Federation of Teachers
to the Council for Chief State School Officers-that have issued studies
in which "the problem of alignment between standards, testing, instruction,
and accountability remains a common theme" (2002).
There is also unease that given the rapidly increasing pressure
to test, the demand for new items will lead to the development and
use of assessment items of low quality. Concern about the quality
of test items has reached the popular press as can be seen in a July
16, 2003, New York Times piece entitled "Before the Answer,
the Question Must Be Correct." The article rightly suggests that "…no
amount of wizardry can create a good test out of poorly written items…" (Dillon,
2003).
To be useful in a standards-based context, items in large-scale
assessments that purport to assess the ideas and skills specified
in content standards need to be linked explicitly to exact ideas
and skills, not just to broadly defined topical areas. Test developers
and test administrators, particularly those at the state and district
levels, need models for items that are well aligned to the content
standards targeted in state and national documents and that also
conform to rigorous psychometric, linguistic, and cognitive requirements.
Tests that are developed from such high-quality items can reliably
inform education policy and decision making and ensure that the consequences
for students, teachers, administrators, and schools are fair. The
item bank we propose is not meant to be used by large-scale test
developers as a major source of items. Most large-scale test developers
have item security requirements that our test bank does not provide
and need many more items than this project can supply. However, commercial
test developers and state assessment officers can use the proposed
item bank as a source of models for rigorous alignment of assessments
with standards. In addition, the tools that we develop will give
test developers the means to revise existing items in their own item
banks so that they are more carefully aligned with targeted content
standards.
Public support. Parents and other members of the
public need to know what it is that children, teachers, and schools
are being held accountable for with respect to the content standards
of their state and local communities and what alignment to those
content standards means. Clear statements of the standards themselves,
as well as assessment items that measure understanding of the ideas
in the standards, are essential for parents to contribute meaningfully
to their children's education. Research with parents has shown that
when well-informed, parents can be vital allies in education reform
efforts, but, according to focus groups of parents convened by Project
2061 for its 1998 report Blueprints for Reform (AAAS, 1998),
without the necessary information, parents often "hesitate to support
initiatives that promote untraditional methods of learning science
because they are unfamiliar with them." In opinion research conducted
by Public Agenda (2000), nearly half of all parents reported that
they were not aware of standards-based reform initiatives, even in
their own districts. The same survey shows that when parents do become
aware of these initiatives, they support them by a large majority.
With easy Web-based access to content standards and assessment information,
parents and other community members and organizations such as science
centers, zoos, and nature museums can become a significant force
in focusing the formal and informal educational experiences of children.
The Need to Clarify Content Standards
We view alignment as a precise match between content standards
and assessment tasks. However, as stated earlier, the exact meaning
of content standards is not always evident. Those who have responsibility
for student learning and for measuring that learning must have a
clear understanding of what students are expected to know and what
constitutes evidence of that knowledge. The Commission on Instructionally
Supportive Assessment (2001) identified nine requirements for assessments
that support instruction and accountability. One of these requirements
says: "A state's high priority content standards must be clearly
and thoroughly described so that the knowledge and skills students
need to demonstrate competence are evident." This clarification "should
result in relatively brief, educator-friendly descriptions of each
high priority standard's meaning" (McColskey & McMunn, 2002,
p. 5).
AAAS, NCTM, and the NRC already include essays to clarify each
cluster of their content standards at each grade band. These essays
spell out the instructional implications of content standards
by focusing on the activities that can be used to advance student
understanding. These activities are based on an overall trajectory
of instructional aims. For example, the AAAS essay dealing with the
topic of diversity of life at the 6-8 grade band states in part: "Students
should begin to extend their attention from external anatomy to internal
structures and functions. Patterns of development may be brought
in to further illustrate similarities and differences among organisms" (AAAS,
1993, p. 104). In mathematics, the NCTM essay for middle school algebra
says in part: "Students in the middle grades should learn algebra
both as a set of concepts and competencies tied to the representation
of quantitative relationships and as a style of mathematical thinking
for formalizing patterns, functions, and generalizations. In the
middle grades, students should work more frequently with algebraic
symbols than in the lower grades. It is essential that they become
comfortable in relating symbolic expressions containing variables
to verbal, tabular, and graphical representations of numbers and
quantitative relationships" (NCTM, 2000, p. 223). Though helpful
for guiding instruction, these essays do not say how students at
a given grade band should be asked to demonstrate their understanding;
nor are they written for each individual content standard. To ensure
that assessment tasks are aligned with content standards, more needs
to be done to make the meaning of each content standard clear with
respect to what students can be asked to do with their knowledge.
Products and Activities
Working with teams of experienced teachers, scientists, mathematicians,
and curriculum researchers and developers, Project 2061 will address
the assessment needs identified above and will produce the following:
- 20 assessment maps focusing on selected content standards to
provide a context for choosing sets of items to gauge student progress
and to diagnose their problems in understanding the ideas targeted
in the content standards;
- a bank of approximately 400 test items for grades 6 through
10 (including multiple choice and both short and extended open-response
items) and full descriptions of each item's alignment to specific
science or mathematics standards and other salient features;
- clarifying statements for the content standards on each of the
20 assessment maps to provide insights on which ideas are-and are
not-targeted in the content standard and to suggest ways in which
students might demonstrate or apply the targeted ideas; and
- an online tool for accessing the assessment items and related
resources from a variety of starting points, such as a state standard,
a topic, or type of assessment item.
This product development will be accomplished through the following
activities to be undertaken over the course of this five-year project.
Development of the item bank will be the focus of our work and involves
the following efforts:
Screen and analyze assessment items. We will screen
hundreds of existing middle- and early high-school science and mathematics
assessment items from as many sources as possible, including released
items from the TIMSS and NAEP tests and state tests. In the initial
screening, items will be sorted by the content standards and the
related ideas we will be targeting. (How we will define the domain
of ideas around which items are to be selected is described in the
section on assessment maps below.) Following the initial screening
and sorting, items will undergo a more rigorous analysis to describe
precisely their alignment to the ideas being targeted and to make
sure that they meet specific effectiveness criteria. This analysis
will be based on an examination of the items themselves and score
reports of student performance on the items from states, NAEP, TIMSS, and
similar instruments. The analysis procedure that we will use is modeled
after a procedure previously developed by AAAS (2003) and involves
the following considerations: (1) Are the ideas and skills specified
in the targeted content standard needed to successfully complete
the assessment item or can the item be answered without that knowledge
and skill? (2) Are the ideas and skills specified in the content
standard enough by themselves to successfully complete the assessment
item or is other knowledge and skill needed? (3) Are students likely
to understand the task statement, diagrams, symbols, etc.? (4) Are
students likely to understand what they are expected to do and what
sort of response is considered satisfactory? (5) Is the task context
appropriately familiar, engaging, and realistic to students? (6)
Could students respond satisfactorily to the task by guessing or
employing other general test-taking strategies? (7) Are scoring rubrics
for open-ended items accurate, clear, complete, and specific?
In addition, to fully address equity concerns and to ensure that
items are accessible to the greatest number of students, we will
also review items for various linguistic features. Items will be
analyzed on the basis of linguistic criteria that support student
access to assessment items, especially for English language learners.
Items that meet these criteria will increase the validity of interpretations
that can be made about student understanding for a wider range of
students. Kopriva (2000) lists the following as important considerations
for making assessment items accessible: (1) Item sentences or stems
must be kept brief and straightforward, with a simple sentence or
phrase structure. (2) Consistency in paragraph structures should
be employed. (3) The present tense and active voice should be used
as much as possible. Concerning the use of visuals in test items:
(1) Visuals should mirror, or parallel, the item statements and expectations.
(2) No supplementary or unnecessary information should be placed
in the visual to distract students from the requirements in the item.
(3) Simple text can and should be used in the visuals that correspond
to important words in the item. Although these recommendations were
written in the context of English language learners, the principles
apply to all students. The point is that to draw valid inferences
regarding a construct being measured, linguistic issues must be taken
into account. The question that needs to be asked is: are there any
features of an item that may limit access by any particular group
of students?
We will also examine items for the cognitive demands that they
place on students. We will draw in part from work currently being
done by Baker et al. (2002) at the Center for Research on Evaluation,
Standards, and Student Testing (CRESST) at UCLA and also the classification
of knowledge and process categories of Anderson and Krathwohl (2001)
adapted from Bloom (1956). We will also make use of the cognitive
demand designations used in the alignment models of Achieve (n.d.),
Webb (1999), the Council on Basic Education (n.d.), and the CCSSO
Survey of the Enacted Curriculum (n.d.) mentioned earlier in this
proposal. Finally, using psychometric consultants from the Department
of Measurement, Statistics, and Evaluation at the University of Maryland,
items will be reviewed for psychometric features and the impact that
inclusion of items might have on whole-test construction.
Revise items. Using the written reports from the
analysis of the items described above, items will be revised to correct
deficiencies that hinder alignment. Based on these analyses, the
context of the item might be changed, language clarified or simplified,
or distracters replaced. In addition to addressing issues raised
by Kopriva's work described above, the revision process will also
make use of work being done by Jim Minstrell (1992) on facets of
knowledge. Facets are bits of knowledge or strategies for reasoning
(both correct and incorrect) used by students when faced with problem
situations. Facets can be very specific or quite general. Examples
include: "Active objects [like hands] exert forces." "Passive objects
[like tables] cannot exert forces." "Heavier objects fall faster." Some
facets are generic and cut across subject areas: "More of one thing
means more of another thing." We will draw on Minstrell's work (1982a,
1982b, 1984, 1989, 1992, 2001) as well as other available research
in this area to incorporate what is known about student thinking
into distracters and for redesigning test items to probe student
thinking.
For some content standards, a significant body of research already
exists on the preconceptions that students often hold (see, for example,
AAAS, 1993 and Driver et al., 1994). For content standards where
the research on student learning is more limited, we will administer
open-ended tasks related to the content standards to a representative
sample of students from schools serving diverse populations. These
open-ended tasks will be used to gain additional insights into student
thinking and to draw attention to productive areas for further research
on student thinking. As part of this work we will interview a sub-sample
of students about their responses. We will use a modified version
of a procedure used by Driver et al. (1994). Interviews will include,
for example, the introduction of a discrepant event to challenge
the students' explanations and further probe the rigidity of their
knowledge frameworks. Results of these interviews will help us design
distracters for assessment items that will enable us to more effectively
probe student understanding.
Reanalyze items. Following the initial analysis
and revision of items, the revised items will be field tested in
a wide sampling of school districts around the country. Teachers
with whom we have worked over the years will provide us with access
to students from a wide range of backgrounds. Items will then be
reanalyzed using the student data and the analysis procedures described
above. Following this reanalysis, reviewers will make recommendations
to accept items, further modify the items, or to eliminate them from
the item pool.
Describe item features. Each item that is retained
will be accompanied by descriptive information concerning the details
of its alignment with content standards, the knowledge needed to
answer the item correctly, whether the item tests for common misconceptions,
and whether the item is likely to be approached differently by diverse
learners-taking into account the item's use of visuals, linguistic
demands, etc.
The description will contain a statement about whether the assessment
item measures declarative knowledge (concepts), procedural knowledge
(skills), or contextual knowledge (applications). These categories
are similar to the categories of conceptual knowledge, scientific
investigation, and practical reasoning used in the Science Framework
for the 1996 and 2000 National Assessment of Educational Progress
(U.S. Department of Education, 1999). For items in which students
are asked to apply their knowledge, a further description of the
type of application will be included as well. For example, items
may ask students to rephrase an idea in their own words, explain
a phenomenon, identify a generalization based on relevant instances,
etc. In mathematics, items will be categorized according to the levels
of complexity described in the 2004 Mathematics Framework for the
National Assessment of Educational Progress (U.S. Department of Education,
2001). These categorizations of items will allow them to be classified,
and thus retrieved, by item type.
To guide our selection, screening, and revision of each assessment
item, we will frame our work through the following activities. The
resulting products will be available to those who want to create
or revise items themselves and to those who want to construct assessment
scales based on the conceptual framework provided in the assessment
maps.
Create assessment maps and link items to maps. We
will create an assessment map for each of 20 middle- and early high-school
science and mathematics content standards selected from national
standards documents. (See Appendix A for an example of an assessment
map.) Assessment maps reflect the interconnectedness of ideas by
showing a progression of learning from prerequisite ideas to targeted
ideas to more sophisticated ideas. Each map will be built around
one or more content standards. The maps will include the ideas from
the content standard itself, prerequisite ideas, one or more related
ideas that come later in the developmental trajectory, and common
misconceptions that have been confirmed through research on student
learning. The maps will allow test developers to choose assessment
items that can yield diagnostic information about student learning,
especially with respect to misconceptions and prerequisite knowledge
that pertain to specific ideas on the maps.
Maps are also a practical device to provide test developers with
a convenient visual boundary around the ideas they might want to
test at any particular time. The maps are not, however, a template
for test construction. They simply present in a convenient format
the targeted ideas and related ideas that could be tested. A test
might be constructed around all of the ideas or just one and might
take into account some of the prerequisite ideas and misconceptions
or only a subset. Nor are the maps restrictive. Single maps can be
combined with other maps to focus test design on a larger set of
ideas at the same time.
We will provide 20 assessment items for each of 20 assessment maps,
at least one item per idea represented on a map for a total of at
least 400 items. There will be a range of items-from low cognitive
demand items to high cognitive demand items, and both multiple-choice
and free-response items. There will be items where visual representations
play a large part in describing the problem and those where word
descriptions are used. Having this range of items is particularly
important when testing students with differing capabilities and learning
preferences (Kopriva, 2000).
We have already developed 10 maps. In science the maps deal with
Control of Variables, Changes in the Earth's Surface, Flow of Matter
and Energy in Living Systems, Newton's First Law, Kinetic Molecular
Theory, Conservation of Matter, and Light and Sight. In mathematics
the maps deal with ideas in Number, Algebra, and Data. The new maps
that we will develop for the proposed project are in addition to
the 10 existing maps.
The maps will be interactive so that users will be able to click
on specific ideas on the maps to access the items in the test bank
as well as their own comparable state standard. In fact, all resources
will be accessible from the assessment maps. This is described in
the section on integration of products below.
Create content standard clarification statements. We
will write and include in a Web-based utility clarifying statements
for each content standard identified on the 20 assessment maps. These
clarifying statements will focus on what each content standard does
and does not suggest regarding what students should be able to do
with their knowledge and skills. As stated above, the existing essays
that accompany AAAS's Benchmarks and the NRC's National
Science Education Standards were written primarily to provide
guidance on the kinds of learning activities that students should
be engaged in. The statements that we will write will help users
to see in more detail how assessment items are related to the ideas
in the content standards. The statements will describe what knowledge
is and is not included in the content standard, the ways that the
knowledge in the content standard might be demonstrated by students,
task contexts that are appropriate and engaging to students at that
age, and the range of cognitive skills that students might reasonably
be expected to use to demonstrate their understanding of the idea.
The clarification statements will be linked to, and therefore accessible
from, the content standards on the 20 assessment maps.
With finalized items and other assessment resources in hand, we
will then focus on making the items and resources easily accessible
through the following activities:
Integrate assessment items, maps, and accompanying information
and link to state and national content standards. We will
directly link assessment items, maps, and accompanying information
to national benchmarks and standards and indirectly to state
standards through the McREL/Align-to-Achieve Academic Standards
e-Library. This e-library is a database of state and national content
standards that have been explicitly linked together based
on grade level and the ideas targeted in the standards. The e-library also contains
its own synthesis of these standards, which is called the Compendix.
The Compendix (http://www.mcrel.org/standards-benchmarks)
lists a total of 154 middle-school benchmarks in science
and mathematics plus additional benchmarks that are appropriate for early high-school
students. The Compendix benchmarks closely overlap with
the content standards produced by AAAS, NCTM, and the NRC.
Although we are purchasing the Align-to-Achieve e-library database
and the software that links the state and national standards, we
will create our own customized user interfaces and functions. This
will allow us to seamlessly integrate the assessment maps, clarification
statements, prerequisite ideas, common student misconceptions, items,
and item descriptions with the database of cross-linked content standards.
Eventually we will be able to provide access to additional resources-visual
representations of scientific concepts, examples of scientific phenomena,
and question sequences-that can help students to learn and teachers
to teach the ideas targeted in the selected content standards.
The online utility, hosted on the Project 2061 Web site, will provide
access from any set of standards-whether at the state, local, or
national level. The utility will offer users free access to all of
the resources developed. Users will be able to find resources using
topic and key word searches or by browsing the assessment maps or
the section headings from the various standards documents.
To be sure that the utility is easy to use and meets the needs
of its potential users, we will conduct interviews with teachers
and administrators, parents, materials developers, curriculum researchers,
and state assessment and curriculum directors and will incorporate
their ideas into the design of the utility interfaces. A prototype
utility will be tested with users to be sure that it functions according
to design specifications and that it can accomplish the intended
tasks (e.g., easily access assessment items from any K-12 standard
document). It will then be revised based on the feedback we receive.
Once the database is created, it will be a permanent feature of
the Project 2061 Web site and will be updated regularly as we develop
additional assessment maps and continue to add high-quality assessment
items to the item bank. Over time it will become part of our growing
collection of Web-based resources. The site license purchased from
Align-to-Achieve provides free updates as states modify their standards.
The Web site will also be linked to the National Science Digital
Library (NSDL).
Dissemination. Disseminating the assessment items
and tools that we create is essential to the success of the project.
Dissemination efforts will target researchers in science education
(including faculty, doctoral students, and postdoctoral fellows)
through the NSF Centers for Learning and Teaching; curriculum developers;
state directors of curriculum and assessment; teacher educators;
and classroom teachers. Centers that will be particularly interested
in these tools for curriculum research purposes include the Center
for Curriculum Materials in Science and the new mathematics curriculum
materials center. We will communicate information about the resources
that we develop through the Project 2061 newsletter and Web site,
through communication outlets of organizations such as NCTM and the
National Science Teachers Association (NSTA), and through Web-based
links with other organizations. We will present papers at professional
meetings such as the National Association for Research on Science
Teaching, the American Educational Research Association, the Association
for Supervision and Curriculum Development, NSTA, NCTM, and other
relevant organizations, and we will submit articles to refereed journals
such as the Journal for Research in Mathematics Education and
the Journal of Research in Science Teaching, and to journals
that reach a more broad-based audience such as Mathematics Teaching
in the Middle School, Mathematics Teacher, The
Science Teacher, Educational Leadership, and The
Kappan. We will utilize the print and Web-based distribution
outlets of our NSF-funded public outreach campaign (ESI-0103678)
to inform parents, informal science organizations, and other community
members of the assessment bank and related tools. We will also disseminate
information about our work through the popular press.
Advisory Board
An Advisory Board will meet in years two and four to review the
project's activities and products and to provide feedback and counsel.
The following individuals have agreed to serve: Theron Blakeslee,
Director of the Math and Science Center of Jackson County, Michigan;
Rolf K. Blank, Director of Education Indicators, Council of Chief
State School Officers; Danine Ezell, Science Specialist, San Diego
Public Schools; Fred Goldberg, Professor of Physics, Center for Research
in Mathematics and Science Education, San Diego State University
and developer of the Constructing Ideas in Physical Science curriculum;
Marshall Gordon, mathematics teacher at the Park School in Baltimore,
Maryland; Mary Lindquist, Callaway Professor of Mathematics Education,
Emeritus, Columbus State University; Virginia Malone, vice president
for evaluation at Harcourt Brace; Marge Petit, Senior Associate,
National Center for the Improvement of Educational Assessment (Center
for Assessment); Barbara Reys, Professor of Mathematics Education
and Director of the Show-Me Center, University of Missouri; Norman
Webb, Senior Research Scientist, Wisconsin Center for Education Research,
University of Wisconsin; and David E. Wiley, emeritus professor,
School of Education and Social Policy, Northwestern University and
Research Faculty, Center for the Study of Assessment Validity and
Evaluation, University of Maryland.
Work Plan
The activities described above will be distributed over the five
years of the grant. Each year the review teams will create four assessment
maps; clarify the relevant content standards; and screen, review,
revise, and pilot-test assessment items. In Years One and Two, the
technology team will develop the basic architecture for the Web-based
item bank and related tools. They will also conduct focus groups
with typical users to define needs and revise the design based on
that feedback. As items are screened and added to the item bank over
the course of the five years, the technology team will create links
between items, maps, and state and national content standards and
build in functions such as browsing, searching, and sorting. Dissemination
activities will take place throughout the life of the grant, including
presentations at relevant meetings and conferences and submission
of papers to journals and culminating with a final rollout when the
item bank is completed in Year Five.
Results of Prior NSF Support
With support from NSF, Project 2061 has developed an array of science
literacy tools to promote understanding and use of content standards,
beginning with the publication of Benchmarks for Science Literacy (AAAS,
1993) (ESI-9350003; $5,000,000; 10/93-9/99). To increase understanding
of conceptual connections among K-12 learning goals, Project 2061
published Atlas of Science Literacy (AAAS, 2001) (ESI-9618093;
$4,746,014; 4/97-3/01), which has sold nearly 15,000 copies, and
is serving as the basis for several recently submitted NSDL and Math
and Science Partnership proposals. To improve the quality of science
and mathematics curriculum materials, Project 2061 developed a set
of criteria to analyze their alignment to important learning goals
and the quality of instructional support they provide for those goals
(ESI-9553594; $888,466; 3/96-2/97 and ESI-9618093). These criteria
have been used to analyze science and mathematics instructional materials
and are being used to guide the design of new materials.
In the area of assessment, Project 2061 is conducting a study of
the alignment of assessment items to national and state standards
and benchmarks for science and mathematics (ESI-9919018; $2,476,875;
5/99-1/04). The goals of the project are to (1) develop a set
of criteria and a procedure for evaluating assessment quality and alignment
and (2) demonstrate the use of the assessment analysis procedure
in typical situations by conducting a series of case studies.
To date, Project 2061 has analyzed nearly 500 items, including items
from two large state pools and from the NAEP and TIMSS tests,
along with items developed for our own research projects. We have also
created guidelines for revising items based on the results of
our analysis and for validating the revisions through student interviews.
Project 2061 staff and consultants have conducted case studies
documenting the application of the analysis procedures and the revision efforts,
and presentations on our work have been made at conferences
and meetings sponsored by organizations such as NSTA, NCTM, the School Science
and Mathematics Association, and Research for Better Schools.
Publications include "Accountability and Assessments" by Leah Bricker in Research
for Better Schools' Currents, Volume 6.1, Fall/Winter 2002; "Aligning
Assessment with Learning Goals," by Natalie Nielsen in ENC Focus,
2000, Volume 7, Number 2; "Putting Tests to the Test" in the Spring/Summer
2001 issue of 2061 Today; "A Revision Protocol Design: Item
Revision and Impact Analysis Report," a report prepared Robert Capraro,
Mary Margaret Capraro, and Mary Hammer of Texas A&M University
and Kay Dighans, a Montana teacher; and "Lessons Learned from Students
about Assessment and Instruction," by Richard Kitchen and Linda Wilson
to be published in NCTM's Teaching Children Mathematics.
Evaluation
Horizon Research, Inc. (HRI) will conduct the external evaluation
for the project. HRI has over 15 years of experience evaluating mathematics
and science education improvement projects, including relatively
small and narrowly focused teacher enhancement projects, a number
of Statewide Systemic Initiatives, and materials development projects.
In addition, HRI has expertise in digital library technologies and
recently evaluated the development of one of the digital libraries
under the National Science Digital Library umbrella.
Evaluation resources will be divided between formative and summative
components. The formative component, designed to inform mid-course
corrections in the project, will focus on two fundamental processes:
(1) analysis and revision of assessment items, and (2) development
of the online utility. The summative component, designed to gauge
the impact of resources created by the project, will be guided by
four questions: (1) What is the quality of the resources, including
the assessment items, descriptions of item features, assessment maps,
and clarification statements? (2) How effectively are the resources
disseminated? (3) How are the resources used? (4) What impact do
the resources have when they are used?
The collection of assessment items is the cornerstone of the project.
These items will be only as good as the processes used to collect,
analyze, and revise them. HRI will observe a sample of the earliest
analysis sessions and will conduct focus group interviews with participants.
Feedback to the project will focus on maximizing the efficiency of
the iterative analysis and revision process.
A second critical feature of the project is the online utility;
even the best resources are of little value if potential users see
them as inaccessible. HRI will conduct a think-aloud protocol with
a sample of individual potential users as they interact with the
utility prototype. HRI will also arrange for a review of the prototype
utility by an expert in designing user interfaces for digital libraries.
Both activities will provide information the project can use to maximize
the usability of the online utility.
The summative component will focus on the quality and impact of
the resources. HRI will arrange for review of the assessment items,
assessment maps, and clarification statements by content experts
who are external to the project. Logs of search and browse activity
on the Web site will be analyzed as one means of determining the
impacts that resources have on users. These logs shed light on the
paths that users most frequently take through the site; e.g., do
users simply "grab items and go," or do they also access the assessment
maps and clarification statements? The logs also can be used to compare
which resources users most frequently access to what is available
in the collection, which may inform future collection and development
efforts.
The most valuable source of information about impacts will be the
users themselves. HRI will conduct in-depth interviews with a sample
of users focused on: (1) resources they are seeking when they come
to the site; (2) components of the online utility they access; (3)
how they actually use the resources they access; and (4) how the
resources impact their work.
Effectiveness of the project's dissemination efforts will be gauged
in two ways. First, HRI will survey a sample of members of the different
target audiences regarding their awareness of the resources. Second,
a sample of individuals using the online utility will be surveyed
to gather demographic information.
HRI will report evaluation findings informally to the project staff
through regular phone and e-mail contact. In addition, HRI will prepare
one formative memo and one evaluation report each year detailing
all evaluation activities and findings.
Personnel
George E. DeBoer is deputy director of Project
2061 and will serve as PI on the project. He holds a Ph.D. in science
education from Northwestern University and joined Project 2061 from
the Division of Elementary, Secondary, and Informal Science of the
National Science Foundation. He is associate director and co-PI for
the Center for Curriculum Materials in Science, and co-PI on Project
2061's IERI mathematics project and the Project 2061 assessment project.
He has been a professor of education at Colgate University since
1974 where he taught courses in the teaching of science and mathematics
and in applied research methodology in the social sciences. At Colgate
Dr. DeBoer held a number of administrative positions including chair
of the Department of Education, acting director of the Division of
Social Sciences, and director of the Master of Arts in Teaching Program.
His primary research interests lie in clarifying the goals of the
science curriculum, analyzing the history of science education, and
analyzing the many meanings of scientific literacy. He has written
extensively on these topics.
Jo Ellen Roseman is director of Project 2061 and
will serve as co-PI on the project, helping to coordinate all efforts
at AAAS/Project 2061. Dr. Roseman is also director and PI for the
Center for Curriculum Materials in Science and PI for Project 2061's
IERI mathematics study, which is examining the relationship between
teaching and learning and characteristics of curriculum materials
and professional development that can improve them. She served as
curriculum director for Project 2061 from 1989 through 2001. In that
capacity she was involved in the design, testing, and dissemination
of Project 2061's science literacy reform tools. She participated
in the development of Benchmarks for Science Literacy, which
describes specific K-12 content standards on the way to science literacy,
and directed the development of Resources for Science Literacy to
help educators focus curriculum, instruction, and assessment and
their own professional development on science literacy. She holds
a Ph.D. in biochemistry from Johns Hopkins University.
Linda Wilson is an assessment expert in mathematics
education who has been the primary consultant for Project 2061's
IERI middle-school mathematics project and for developing assessment
maps and goals-based assessments. She will be the primary consultant
for the mathematics portion of this project. She has a Ph.D. in mathematics
education from the University of Wisconsin. She taught mathematics
education courses at the College of Education at the University of
Delaware, where she was on the faculty. She helped write the Assessment
Standards for School Mathematics, published by NCTM. At the
U.S. Department of Education on the Voluntary National Test in Mathematics,
she headed the committee that wrote the framework for the 2004 NAEP
test in mathematics. Her research has included teachers' classroom
assessment practices, analyses of student work on test items, the
development of tests that measure specific learning goals in mathematics,
and increasing the validity of mathematics test items for English
language learners.
Jim Minstrell will be the primary consultant for
the science portion of the project and will advise the project on
issues regarding revision of test items using the facets of student
knowledge approach. He has been a PI on several teaching and learning
grants. Through his classroom experience and interest in the cognition
of learners he has focused on development of assessment, curriculum,
and teaching systems with a two-part goal in mind: to identify problematic
conceptions and reasoning in learners, and to adapt instruction accordingly.
His approach aims to build on strengths in the students' thinking,
while specifically challenging problematic ideas and procedures.
Minstrell serves as an advisor to several institutions, has delivered
numerous presentations and workshops on learning and teaching nationally
and internationally, and received numerous awards and honors for his
research and teaching. Dr. Minstrell holds a Ph.D. in science education
from the University of Washington.
Rebecca J. Kopriva is director of the Center for
the Study of Assessment Validity and Evaluation (C-SAVE), which is
housed in the Department of Measurement, Statistics, and Evaluation
at the University of Maryland. She will advise project staff on issues
regarding psychometric properties of items and issues related to
access to items by English language learners through workshop training
sessions and ongoing consultation. Formerly she was associate professor
in the California State University System, state testing director,
and consultant for test publishers, the U.S. Department of Education,
national legal and policy groups, and a variety of states and districts.
Dr. Kopriva is a researcher who publishes and presents regularly
on the theory and practice of improving large-scale test validity
and comparability. She is a leader in addressing these topics as
they relate to the measurement of academic knowledge and skills in
racial, cultural, and ethnic minority students and students with
disabilities.
Joan D. Pasley, senior research associate at HRI,
will be responsible for data collection related to science assessment
and will coordinate all external evaluation activities. Dr. Pasley
received a Ph.D. in curriculum and instruction from the University
of North Carolina at Chapel Hill. Dr. Pasley has been working with
HRI since 1994 on a number of research and evaluation projects, including
the evaluation of the Ohio, South Carolina, and New Jersey Statewide
Systemic Initiatives. Dr. Pasley currently coordinates the standardized
evaluation system for NSF's Local Systemic Change through Teacher
Enhancement project and directs the evaluation of e-Mentoring for
Student Success, an online mentoring program for beginning science
and mathematics teachers. In addition, Dr. Pasley manages the Increasing
the Availability of Materials for the Professional Development of
Science and Mathematics Teachers project.
Daniel J. Heck, senior research associate at HRI,
will be responsible for data collection related to mathematics assessment.
Mr. Heck received a Bachelor's Degree in Mathematics and History
and a Master's Degree in Education from Wake Forest University. He
is completing his Ph.D. in educational psychology from the University
of Illinois at Urbana-Champaign, with a specialization in quantitative
and evaluative research methodologies. Mr. Heck directed the study
of the Impact of the Statewide Systemic Initiatives project, a research
study of the National Science Foundation funded initiatives in 25
states and the Commonwealth of Puerto Rico. Mr. Heck currently directs
the evaluation of the Indiana Mathematics Initiative and the Center
for Curriculum Materials in Science. He also leads HRI's longitudinal
studies of the core evaluation of the Local Systemic Change project.
Iris R. Weiss, president of HRI, will provide
consultation to the evaluation team and will review all data collection
instruments and evaluation reports. Dr. Weiss received a Bachelor's
Degree in biology from Cornell University, a Master's Degree in science
education from Harvard University, and a Ph.D. in curriculum and
instruction from the University of North Carolina at Chapel Hill.
Dr. Weiss has directed many of HRI's research, development, and evaluation
projects since the company's initiation in 1987 and continues to
be responsible for quality control of all HRI projects.
|