EDUC591J: FUNDAMENTALS OF TEST CONSTRUCTION
Stephen G. Sireci, Ph.D.
156 Hills South
http://www-unix.oit.umass.edu/~sireci
(413)545-0564 (voice)
Office Hours:
Tuesday: Noon—12:30, Wednesday: Noon—2:30
Other times by appointment
Course Syllabus and Schedule
for Fall 2007
Course
Objectives:
This course will provide information on how to build and evaluate educational
tests and how to effectively and appropriately interpret test results. Students will learn about the advantages and
disadvantages of different assessment formats such as selected response items,
performance assessments, and computer-based testing. Specifically, students will learn how to:
o
describe
fundamental aspects of test quality such as reliability and validity
o
operationally
define testing purposes
o
develop a
variety of item formats including multiple-choice and constructed response
items
o
develop answer
keys and scoring rubrics for different item formats
o
evaluate tests
and items using statistical and qualitative methods
o
incorporate
meaning into test score scales using both norm-referenced and
criterion-referenced
procedures
o
use standard
setting techniques to set “passing scores” and other standards on tests
o
develop
appropriate documentation to properly communicate the quality of an assessment
o
understand the
utility of educational assessments within the broader context of educational
policy and decision
making
The
common theme unifying these knowledge and skill areas is the promotion of
equity and fairness in testing. In
addition, the course stresses the role of educational testing in improving
student learning. In this course,
students will learn how to build quality tests aimed towards promoting valid
score interpretation, and will learn how to evaluate the use of a specific test
for a specific purpose. Measuring
psychological phenomena such as what a student “knows and is able to do” is a
complex endeavor. Test construction is
both art and science; both aspects will be stressed in this course. Upon successful completion of this course,
students will know how to (a) develop tests, (b) choose among already existing
tests for a specific purpose, (c) use the results of standardized tests to help
make decisions about students and educational systems, and (d) identify flaws
in educational assessments.
Some
specific topics covered in the course are:
o
Purposes
of Educational Tests
o
Standards
for Teacher Competence in Educational Assessment
o
Standards
for Educational and Psychological Testing
o
Fundamental
Elements of Test Quality (e.g., reliability, validity)
o
Developing
Multiple-Choice Items
o
Developing
and Scoring Performance Assessments
o
Developing
Portfolio Assessments
o
Item
Analysis
o
Evaluating
the Validity of Score-Based Inferences
o
Standard
Setting (e.g., setting passing scores)
o
Innovative
Item Formats and Computer-Based Testing
o
Test
Accommodations for individuals with disabilities and for English language learners
o
Sensitivity
Review
o Ethical Issues in Test
Construction, Selection, Administration, and Interpretation
Course
Requirements
A. Attendance
and Participation: Students
expecting to receive course credit will need to attend all (or nearly all)
classes, work their way through the suggested readings, and complete several
assignments. In addition, students are
expected to actively participate in class.
B. Assignments:
In addition to weekly homework assignments, there are two major assignments for
the course:
1) Test Development: Each student
is required to develop an educational achievement test. This semester, we will be developing math
and reading tests for adult basic education students in Massachusetts (see http://www.doe.mass.edu/acls/mailings/2004/0709/mathtestspecs.pdf
and http://www.doe.mass.edu/acls/mailings/2004/0709/readtestspecs.pdf
for test specifications). Throughout
the semester, students will progressively work on the design and development of
their test. This process will involve
writing and field-testing items (using a very small pilot sample). Students will be required to develop a final
version of the test and administer it to a small sample. I will help students throughout the test
development process.
2) Technical Manual Development:
Each student is required to develop a technical manual that describes the test
development process and provides important psychometric data for the test they
develop. Instructions for developing
this manual will be provided in class.
Chapter 6 of the Standards for Educational and Psychological Testing
(AERA, APA, & NCME, 1999) should be used to guide development of the
manual.
Grading: Students’ final grades
determined by their attendance and participation in class, and by their
performance on their weekly assignments, draft test, final test, and technical
manual. Late assignments will be
reduced by one-letter grade for each day late (e.g., a maximum grade of
"C" will be given to an exceptional draft test submitted two days
late). Unforeseen emergencies,
as determined by the professor, are exceptions to this policy. The table below illustrates the weighting
used in calculating grades.
Activity
|
Weight |
|
Attendance/Participation |
.15 |
|
Weekly
assignments |
.30 |
|
First
Test Draft |
.05 |
|
Final
Test Form |
.25 |
|
Technical
Manual |
.25 |
Attendance/participation
and all assignments are graded on a 0-100 scale. Final grades of 94-100 receive an A, 90-93 receive an A-, 87-89
receive a B+, 81-86 receive a B, 78-80 receive a C+, 70-77 receive a C, and
below 70 receive an F.
Suggested
Textbook
Linn, R. L., & Gronlund,
N. E. (2005). Measurement and
assessment in teaching (9th edition). Upper Saddle River, NJ:
Prentice-Hall.
This
book is available in the Textbook Annex.
It is a terrific book. If this
is your first course in educational measurement, you should get it. If you have other books that cover test
construction, you may not need it because I will provide numerous handouts on
all topics covered in the course. Feel
free to use an earlier edition of this book, if you can find it.
American Educational
Research Association, American Psychological Association, & National
Council on Measurement in Education (1999).
Standards for educational and psychological testing. Washington, DC: American Educational
Research Association.
[Available for purchase at http://www.aera.net/publications/?id=313#standards.]
Each
week you will have one or more reading assignments. Most of these readings are listed in the bibliography that
appears next. I will distribute copies
of all reading assignments free of charge.
Articles that are likely to appear as reading assignments are denoted
with an asterisk (*).
Aiken, L. R. (1980).
Content validity and reliability of single items or questionnaires. Educational and Psychological Measurement,
40, 955-959.
Almond, R. G., Steinberg, L., S., & Mislevy, R. J.
(2002). Enhancing the design and
delivery of assessment systems: A
four-process architecture. Journal
of Technology, Learning, and Assessment, 1(5). Available at http://www.bc.edu/research/intasc/jtla/journal/v1n5.shtml.
*American Educational Research Association, American
Psychological Association, & National Council on Measurement in Education
(1999). Standards for educational
and psychological testing.
Washington, DC: American Educational Research Association.
American Psychological Association (2001). Publication manual of the American
Psychological Association (5th edition).
Washington, DC: Author.
Anastasi, A. (1988). Psychological testing (6th
edition). New York: Macmillan.
Angoff, W. H. (1984).
Scales, norms, and equivalent scores.
Princeton, NJ: Educational
Testing Service. (Reprint of chapter In
R.L. Thorndike (Ed.) Educational Measurement (2nd Edition), Washington,
DC: American Council on Education,
1971).
Baron, J. B. (1991).
Strategies for the development of effective performance exercises. Applied Measurement in Education,
4, 305‑318.
Bennet, R., & Ward, W. (1993). Construction versus choice in cognitive
measurement. Hillsdale, NJ:
Lawrence Erlbaum Associates.
Berk. R. A. (Ed.), (1984). A guide to criterion‑referenced test construction. Baltimore:
Johns Hopkins University Press.
Brennan, R. L. (2001). Some problems, pitfalls, and paradoxes in educational
measurement. Educational
Measurement: Issues and Practice, 20(4),
6-18.
Burger, S. E., & Burger, D. L. (1994). Determining the validity of
performance-based assessment. Educational
Measurement: Issues and Practice,
13(1), 9-15.
Chakwera, E., Khembo, D.,
& Sireci, S. G. (2004). High-stakes
testing in the warm heart of Africa:
The challenges and successes of the Malawi National Examinations
Board. Education Policy Analysis
Archives, 12(29) (see http://epaa.asu.edu/epaa/v12n29/.
*Cizek, G. J. (1996).
Setting passing scores. [An NCME
instructional module]. Educational
Measurement: Issues and Practice,
15 (2), 20-31.
*Cizek, G. J. (2001).
More unintended consequences of high-stakes testing. Educational Measurement: Issues and Practice, 20 (4), 19-27.
Cizek, G. J. (2001).
Standard
setting: Concepts, methods, and perspectives.
Mahwah, NJ: Lawrence Erlbaum.
Cizek, G. J., Bunch, M. B.,
& Koons, H. (2004). Setting performance standards: Contemporary methods. Educational
Measurement: Issues and Practice, 23(4), 31-50.
Clauser, B. E., Subhiyah, R. G, Nungester, R. J.,
Ripkey, D. R., Clyman, S. G., McKinley, D. (1995). Scoring a performance-based assessment by modeling the judgments
of experts. Journal of Educational
Measurement, 32, 397‑415.
*Crocker, L. M., Miller, D., and Franks E. A. (1989). Quantitative methods for assessing
the fit between test and curriculum. Applied
Measurement in Education, 2, 179‑194.
Council of Chief State School Officers (1992). Recommendations for improving the assessment
and monitoring of students with limited English proficiency. Washington, DC: Author.
Cronbach, L. J. (1946). Response sets in objective tests. Educational and psychological measurement, 6,
475-494.
Cronbach, L. J. (1988). Five perspectives on the validity argument. In H. Wainer & H.I. Braun (Eds.), Test
validity (pp. 3-17). Hillsdale, New
Jersey: Lawrence Erlbaum.
Downing, S. M. (1990, April). True-false and alternative choice
formats: A review of the research. Paper presented at the annual meeting of the
National Council on Measurement in Education, Boston, MA.
Downing, S. M., &
Haladyna, T. M. (1997). Test item
development: Validity evidence from
quality assurance procedures. Applied
Measurement in Education, 10, 61-82.
Downing, S. M., &
Haladyna, T. M. (Eds.). (2006). Handbook
of testing (pp. 329-347). Mahwah,
NJ: Lawrence Erlbaum.
Drasgow, F., &
Olson-Buchanan, J. B. (Eds.)
(1999). Innovations in
Computerized Assessment. Mahwah, NJ:
Lawrence Erlbaum.
*Dunbar, S. B., Koretz, D. M., & Hoover, H. D.
(1991). Quality control in the development
and use of performance assessments. Applied
Measurement in Education, 4, 289‑303.
Dwyer, C. A. (1996).
Cut scores and testing:
Statistics, judgment, truth, and error.
Psychological Assessment, 8, 360-362.
Fisher, R. J. (1994).
The Americans With Disabilities Act:
Implications for measurement. Educational
Measurement: Issues and Practice,
13(3), 17-26, 37.
Gallagher, J. D. (1998). Classroom assessment for teachers. Upper Saddle River, NJ: Merrill.
Geisinger, K. F. (1994). Cross-cultural normative assessment: Translation and adaptation
issues influencing the normative interpretation of assessment instruments. Psychological Assessment, 6, 304-312.
Geisinger, K. F. (1994). Psychometric issues in testing students with disabilities. Applied Measurement in Education, 7,
121-140.
Glaser, R. (1963).
Instructional technology and the measurement of learning outcomes: Some questions. American Psychologist, 18, 519-521.
Glaser, R. (1994).
Criterion-referenced tests: Part
I Origins. Educational
Measurement: Issues and Practice,
13(4), 9-11.
Goldstein, H. (1994).
Recontextualizing mental measurement.
Educational Measurement: Issues and Practice, 12,
16-19, 43.
Haladyna, T. M. (1992). The effectiveness of several multiple-choice item formats. Applied Measurement in Education, 5,
73-88.
Haladyna, T. M. (1994). Developing and validating multiple-choice test items. Hillsdale:
Lawrence Erlbaum.
*Haladyna, T. M., & Downing, S. M. (1989). A taxonomy of multiple-choice item writing
rules. Applied Measurement in
Education, 2, 37-50.
Haladyna, T. M., & Shindoll, R. R. (1989). Item shells: A method for writing effective multiple-choice items. Evaluation & The Health Professions,
12, 97-106.
Hambleton, R. K., (1984). Validating the test score In R.A. Berk (Ed.), A guide to
criterion‑referenced test construction. Baltimore:
Johns Hopkins University Press, pp. 199-230.
Hambleton, R. K. (1994). Guidelines for adapting educational and psychological tests: A progress report. European Journal of Psychological Assessment, 10,
229-244.
Hambleton, R. K., & Sireci, S. G. (1997). Future directions for norm-referenced and
criterion-referenced achievement testing.
International Journal of Educational Research, 27 (5), 379-393.
*Hambleton, R. K., &
Zenisky, A. (2003). Advances in
criterion-referenced testing methods and practices. In C. R. Reynolds & R. W. Kamphaus (Eds.). Handbook of psychological and educational
assessment of children (2nd Ed., pp. 377-404).
*Huff, K. & Goodman, Dean P. (2007). The demand for cognitive diagnostic assessment. In Leighton, J. & Gierl, M. (Eds) Cognitive diagnostic assessment for education. Cambridge: Cambridge University Press.
Huff, K. L., & Sireci,
S. G. (2001). Validity issues in
computer-based testing. Educational
Measurement: Issues and Practice,
20 (3), 16-25.
*Joint Committee on Testing Practices (2004). Code of Fair Testing Practices in
Education. Washington, DC: American Psychological Association. Available for download at http://www.apa.org/science/fairtestcode.html.
Koretz, D., Stecher, B., Klein, S., &
McCaffrey, D. (1994). The Vermont portfolio assessment
program: Findings and implications. Educational Measurement: Issues and Practice, 13(3), 5-16.
Kreiter, C. D., & Frisbie, D. A. (1989). Effectiveness of multiple true-false
items. Applied Measurement in
Education, 2, 207‑216.
Kuehn, P. A., Stallings, W.
M., Holland, C. L. (1990). Court-defined job analysis requirements for
validation of teacher certification tests. Educational Measurement: Issues and Practice, 9 (4), 21-24.
Lane, S. (1993).
The conceptual framework for the development of a mathematics
performance assessment instrument.
Educational Measurement: Issues
and Practice, 12(2), 16-23.
Linn, R.L. (1994). Criterion-referenced measurement: a valuable perspective clouded by surplus
meaning. Educational Measurement: Issues and Practice, 13,
12-15.
Linn, R. L. (2000).
Assessments and accountability. Educational
Researcher, 29(2), 4-16.
Linn, R. L. (2003, September
1). Performance standards: Utility for different uses of
assessments. Educational Policy and
Analysis Archives, 11(31).
Retrieved September 1, 2003 from http://epaa.asu.edu/epaa/v11n31.
*Linn, R. L., & Burton, E. (1994). Performance-based assessment: Implications of task specificity. Educational Measurement: Issues and Practice, 13(1), 5-8, 15.
Livingston, S.A. (1982). Estimation of the conditional standard error of measurement for
stratified tests. Journal of
Educational Measurement, 19, 135-138.
Livingston, S. A., & Zieky, M. J. (1982). Passing scores: A manual for setting standards of performance on
educational and occupational tests.
Princeton, NJ: Educational
Testing Service.
Lukhele, R. Thissen, D., & Wainer, H. (1994). On the relative value of multiple-choice,
constructed response, and examinee-selected items on two achievement tests. Journal of Educational Measurement,
31, 234-250.
Meara,
K. P., Hambleton, R. K., & Sireci, S. G. (2001). Setting and validating standards on professional licensure and
certification exams: A survey of
current practices. CLEAR Exam
Review, 12 (2), 17-23.
Mentzer, T. L. (1982). Response biases in multiple-choice test item files. Educational and psychological
Measurement, 42, 437-448.
Messick, S. (1989).
Validity. In R. Linn (Ed.), Educational measurement, (3rd
ed.) (pp. 13-103). Washington, D.C.: American Council on Education.
Millman, J., & Greene, J. (1989). The specification and development of tests
of achievement and abilities. In R.
Linn (Ed.), Educational measurement,
(3rd ed. (pp. 335-366). Washington, D.C.: American Council on Education.
Nelson, D. S. (1994).
Job analysis for licensure and certification exams: Science or politics? Educational Measurement: Issues and Practice, 13(3), 29-35.
Nolen, S. B., Haladyna, T. M., & Haas, N. S.
(1992). Uses and abuses of achievement
test scores. Educational
Measurement: Issues and Practice,
11(2), 9-15.
Nunnally, J. C. (1978). Psychometric theory.
New York: MacGraw-Hill.
O’Neil, T., Sireci, S. G., & Huff, K. F. (2004). Evaluating the consistency of test content across
two successive administrations of a state-mandated science assessment. Educational Assessment, 9, 129-151.
Osterlind, S. J. (1989). Constructing test items. Hingham, MA: Kluwer.
Pearson, P. D. & Garavaglia, D. R. (1997). Improving the information value of
performance items in large scale assessments.
Paper commissioned by the NAEP Validity Studies Panel. Palo Alto, CA: American Institutes for
Research.
Phillips, S. E. (1994). High-stakes testing accommodations: validity versus disabled rights.
Applied Measurement in Education, 7, 93-120.
Phye, G. D. (1997).
Handbook of classroom assessment. San Diego, CA: Academic
Press.
Popham, W. J. (1992).
A tale of two test-specification strategies. Educational Measurement: Issues and Practice, 11(2), 16-17,22.
*Popham,
W. J., Baker, E. L., Berliner, D. C, Yeakey, C. C., Pelligrino, J. W.,
Quenemoen, R. F., Roderiquez-Brown, F. V., Sandifer, P. D., Sireci, S. G.,
& Thurlow, M. L. (2001, October). Building
tests to support instruction and accountability: A guide for policymakers.
Commission on Instructionally Supportive Assessment. Available at http://www.nea.org/accountability/buildingtests.html
.
Quellmalz, E. S. (1991). Developing criteria for performance
assessments: The missing link. Applied Measurement in Education, 4,
319-331.
Reckase, M. D. (1995). Portfolio assessment: A
theoretical estimate of score reliability.
Educational Measurement:
Issues and Practice, 14(1),12-14, 31.
Sands, W. A., Waters, B. K. & McBride, J. R.
(Eds.). (1997). Computerized
adaptive testing: From inquiry to operation. Washington, DC: American Psychological
Association.
Shavelson, R. J., Baxter, G., & Pine, J.
(1991). Performance assessment in
science. Applied Measurement in
Education, 4, 347‑362.
*Sireci, S.G. (1998). Gathering and analyzing content validity
data. Educational Assessment,5,
299-321.
Sireci, S. G. (1998).
The construct of content validity.
Social Indicators Research.
*Sireci, S.G. (2005). The most frequently unasked questions about
testing. In R. Phelps (Ed.), Defending
standardized testing (pp. 111-121).
Mahwah, NJ: Lawrence Erlbaum.
Sireci,
S. G. (2005). Unlabeling the
disabled: A perspective on flagging
scores from accommodated test administrations.
Educational Researcher, 34(1), 3-12.
Sireci, S.G. (2003). Content
validity. Encyclopedia of psychological assessment (pp. 1075-1077).
London: Sage.
*Sireci, S. G. (2003). Validity.
Encyclopedia of psychological assessment (pp.
1067-1069).London: Sage.
*Sireci, S. G. (2004). Computerized-adaptive testing: An introduction. In J. Wall and G. Walz (Eds).
Measuring up: Assessment
issues for teachers, counselors, and administrators (pp. 685-6947).
Greensboro, NC: CAPS Press.
Sireci, S. G., DeLeon, B.,
& Washington, E. (2002, Spring).
Improving teachers of minority students’ attitudes towards and knowledge
of standardized tests. Academic
Exchange Quarterly, 162-167.
Sireci, S.G., & Geisinger,
K.F. (1998). Equity issues in
employment testing. In J.H. Sandoval, C. Frisby, K.F. Geisinger, J.
Scheuneman, & J. Ramos-Grenier (Eds.).
Test interpretation and diversity (pp. 105-140). American Psychological Association: Washington, D.C.
Sireci, S.G., & Green,
P.C. (2000). Legal and psychometric
criteria for evaluating teacher certification tests. Educational Measurement: Issues and Practice, 19(1),
22-31, 34.
Sireci, S. G., Hambleton,
R. K., & Pitoniak, M. J. (2004).
Setting passing scores on licensure exams using direct consensus. CLEAR
Exam Review 15(1), 21-25.
*Sireci, S. G., & Mullane, L. A. (1994). Evaluating test fairness in licensure
testing: the sensitivity review
process. CLEAR Exam Review, 5
(2) 22‑28.
Sireci, S. G., & Parker,
P. (2006). Validity on trial: Psychometric and legal conceptualizations of
validity. Educational
Measurement: Issues and Practice, 25(3),
27-34.
Sireci, S. G., &
Pitoniak, M. J. (2007). Assessment
accommodations: What have we learned
from research? Large scale
assessment and accommodations: What
works? In C. C. Laitusis & L.
Cook (Eds.) (pp. 53-65).
Sireci, S.G., Robin, F., & Patelis, T. (1999). Using cluster analysis to facilitate
standard setting. Applied
Measurement in Education, 12, 301-325.
Sireci,
S. G., Scarpati, S., & Li, S. (2005).
Test accommodations for students with disabilities: An analysis of the interaction hypothesis. Review of Educational Research, 75,
457-490.
Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based
tests. Journal of Educational
Measurement, 28, 237-247.
Sireci, S. G.,
Wainer, H., & Braun, H. (1998).
Psychometrics, overview. In P.
Armitage, & T. Colton (Eds.), Encyclopedia of Biostatistics, London:
John Wiley & Sons.
Sireci,
S.G, Wiley, A., & Keller, L.A. (2002). An empirical evaluation of selected multiple-choice
item writing guidelines. CLEAR Exam Review, 13(2),
20-26.
Sireci, S. G., &
Zenisky, A. L. (2006). Innovative item
formats in computer-based testing: In
pursuit of improved construct representation.
In S.M. Downing and T.M. Haladyna (Eds.), Handbook of Testing
(pp. 329-347). Mahwah, NJ: Lawrence Erlbaum.
Smith, I. L., & Hambleton, R. K. (1990). Content validity studies of licensing examinations. Educational Measurement: Issues and Practice, 9(4), 7-10.
Stiggins, R. J. (1997). Student-centered classroom assessment. New York:
Merrill.
Thissen, D., Wainer, H., & Wang, X-B. (1994).
Are tests comprising both multiple-choice and free-response items
necessarily less unidimensional than multiple-choice tests? An analysis of two tests. Journal of Educational Measurement,
31, 113-123.
*Thompson, S, & Thurlow,
M. (2002, June). Universally designed
assessments: Better tests for everyone! Policy Directions, Number 14. Minneapolis, MND: National Center on Educational Outcomes.
Thorndike, R. L. (1982). Applied psychometrics.
Boston: Houghton Mifflin.
*Wainer, H. (1989).
The future of item analysis. Journal of Educational Measurement, 26,
191-208.
*Wainer, H. (1993).
Some practical considerations when converting a linearly administered
test to an adaptive format. Educational
Measurement: Issues and Practice, 12, 15-20.
Wainer, H. (Ed.).
(2000). Computerized adaptive
testing: A primer (2nd edition).
Hillsdale, NJ: Lawrence Erlbaum.
Wainer, H., & Braun, H. (1988). Test validity. Lawrenceville, NJ: Erlbaum.
Wainer, H., & Kiley, G. L. (1987). Item clusters and computerized adaptive
testing: A case for testlets. Journal of Educational Measurement,
24, 185-201.
Wall, J. E., & Walz, G.
R. (Eds.) (2004). Measuring up: Assessment Issues for teachers, counselors,
and administrators. Greensboro,
NC: CAPS Press.
Williamson, D. M., Mislevy,
R. J., & Almond, R. G. (2004).
Evidence-centered design for certification and licensure. CLEAR Exam Review, 15(2), 14-18.
*Zenisky, A.L., & Sireci,
S.G. (2002). Technological innovations
in large-scale assessment. Applied
Measurement in Education, 15, 337-362.
Plagiarism policy:
Direct
copying of someone else=s work is not allowed. Printing out someone else=s computer output, and
handing it in as your own work, is also not allowed. Passing off someone else=s work as your own will
result in failing this course. Please
see me if you have questions about this policy, or if you have trouble
completing any assignments.
TENTATIVE CLASS SCHEDULE FOR FALL
2007
Listed below are the topics
that will be covered as well as a list of suggested readings.
The dates listed for each
topic are tentative.
Class
|
Topics |
Readings |
|
9/4 |
Purposes of Educational
Tests Standards for Teacher
Competence in Assessment Norm- and
Criterion-referenced testing |
Cizek (2001) Text Chs. 1-2, &
Appendix D |
|
9/11 |
Reliability and Validity
Fundamentals Planning the Development
of a Test |
Sireci (2005), Text Chs.,
3-5 Hambleton & Zenisky
(2003) |
|
9/18 |
Defining Test Content and
Developing Test Specifications |
Text Ch. 6 |
|
9/25 |
Assessment Format Options Writing Multiple-Choice
(MC) Items |
Text Ch. 8 Haladyna & Downing
(1989) |
|
10/2 |
Writing MC Items
(continued) Writing Other Objectively
Scored Items |
Haladyna (1992); Text Ch.
7 |
|
10/9 |
No Class—Monday Schedule @
UMASS |
|
|
10/16 |
Developing Performance
Assessments |
Text Chs. 10-11; Dunbar et
al. (1991); Linn & Burton (1994) |
|
10/23 |
Scoring Performance
Assessments |
Ch. 11, Handouts |
|
10/30 |
Evaluating Tests for
Content Validity Sensitivity Review |
Crocker, et al. (1989);
Sireci (1998) Sireci & Mullane
(1994) |
|
11/6 |
Field Testing and Item
Analysis |
Text Ch. 14, Handouts Wainer (1989) |
|
11/13 |
Incorporating Meaning Into
the Test Score Scale Setting Standards on
Educational Tests |
Text Ch. 19 Cizek (1996) |
|
11/20 |
Evidence-Centered Test
Design |
Huff & Goodman (2007) |
|
11/27 |
Understanding Test Results Developing a Technical
Manual |
AERA, APA, & NCME
(1999) |
|
12/4 |
Computer-based testing |
Sireci (2004); Sireci
& Zeniksy (2005) Wainer (1993); Zenisky
& Sireci (2002) |
|
12/11 |
Test Accommodations and
Universal Test Design Portfolio Assessment |
Geisinger (1994); Phillips
(1994) Thompson & Thurlow
(2002); Text Ch. 12 Koretz et al. (1994);
Reckase (1995), |
|
12/21 |
Final Test Form and
Technical Manual Due (no class) |
|