EDUC591J:  FUNDAMENTALS OF TEST CONSTRUCTION

 

Stephen G. Sireci, Ph.D.

156 Hills South

http://www-unix.oit.umass.edu/~sireci

(413)545-0564 (voice)

 

Office Hours:

Tuesday:  Noon—12:30, Wednesday:  Noon—2:30

Other times by appointment

 

Course Syllabus and Schedule for Fall 2007

 

Course Objectives: This course will provide information on how to build and evaluate educational tests and how to effectively and appropriately interpret test results.  Students will learn about the advantages and disadvantages of different assessment formats such as selected response items, performance assessments, and computer-based testing.  Specifically, students will learn how to:

 

o             describe fundamental aspects of test quality such as reliability and validity

o             operationally define testing purposes

o             develop a variety of item formats including multiple-choice and constructed response items

o             develop answer keys and scoring rubrics for different item formats

o             evaluate tests and items using statistical and qualitative methods

o             incorporate meaning into test score scales using both norm-referenced and criterion-referenced

         procedures

o             use standard setting techniques to set “passing scores” and other standards on tests

o             develop appropriate documentation to properly communicate the quality of an assessment

o             understand the utility of educational assessments within the broader context of educational

         policy and decision making

 

The common theme unifying these knowledge and skill areas is the promotion of equity and fairness in testing.  In addition, the course stresses the role of educational testing in improving student learning.  In this course, students will learn how to build quality tests aimed towards promoting valid score interpretation, and will learn how to evaluate the use of a specific test for a specific purpose.  Measuring psychological phenomena such as what a student “knows and is able to do” is a complex endeavor.  Test construction is both art and science; both aspects will be stressed in this course.  Upon successful completion of this course, students will know how to (a) develop tests, (b) choose among already existing tests for a specific purpose, (c) use the results of standardized tests to help make decisions about students and educational systems, and (d) identify flaws in educational assessments.

 

Some specific topics covered in the course are:

o       Purposes of Educational Tests  

o       Standards for Teacher Competence in Educational Assessment

o       Standards for Educational and Psychological Testing

o       Fundamental Elements of Test Quality (e.g., reliability, validity)

o       Developing Multiple-Choice Items

o       Developing and Scoring Performance Assessments

o       Developing Portfolio Assessments

o       Item Analysis

 

o       Evaluating the Validity of Score-Based Inferences

o       Standard Setting (e.g., setting passing scores)

o       Innovative Item Formats and Computer-Based Testing

o       Test Accommodations for individuals with disabilities and for English language learners

o       Sensitivity Review

o       Ethical Issues in Test Construction, Selection, Administration, and Interpretation

 

Course Requirements

 

A. Attendance and Participation:  Students expecting to receive course credit will need to attend all (or nearly all) classes, work their way through the suggested readings, and complete several assignments.  In addition, students are expected to actively participate in class.

B. Assignments: In addition to weekly homework assignments, there are two major assignments for the course:

 

     1) Test Development: Each student is required to develop an educational achievement test.  This semester, we will be developing math and reading tests for adult basic education students in Massachusetts (see http://www.doe.mass.edu/acls/mailings/2004/0709/mathtestspecs.pdf and http://www.doe.mass.edu/acls/mailings/2004/0709/readtestspecs.pdf for test specifications).  Throughout the semester, students will progressively work on the design and development of their test.  This process will involve writing and field-testing items (using a very small pilot sample).  Students will be required to develop a final version of the test and administer it to a small sample.  I will help students throughout the test development process.

 

     2) Technical Manual Development: Each student is required to develop a technical manual that describes the test development process and provides important psychometric data for the test they develop.  Instructions for developing this manual will be provided in class.  Chapter 6 of the Standards for Educational and Psychological Testing (AERA, APA, & NCME, 1999) should be used to guide development of the manual. 

 

Grading: Students’ final grades determined by their attendance and participation in class, and by their performance on their weekly assignments, draft test, final test, and technical manual.  Late assignments will be reduced by one-letter grade for each day late (e.g., a maximum grade of "C" will be given to an exceptional draft test submitted two days late).  Unforeseen emergencies, as determined by the professor, are exceptions to this policy.  The table below illustrates the weighting used in calculating grades.  

 

Activity

Weight

Attendance/Participation

.15

Weekly assignments

.30

First Test Draft

.05

Final Test Form

.25

Technical Manual

.25

 

Attendance/participation and all assignments are graded on a 0-100 scale.  Final grades of 94-100 receive an A, 90-93 receive an A-, 87-89 receive a B+, 81-86 receive a B, 78-80 receive a C+, 70-77 receive a C, and below 70 receive an F.


Suggested Textbook

 

Linn, R. L., & Gronlund, N. E. (2005).  Measurement and assessment in teaching (9th edition).  Upper Saddle River, NJ:  Prentice-Hall. 

 

This book is available in the Textbook Annex.  It is a terrific book.  If this is your first course in educational measurement, you should get it.  If you have other books that cover test construction, you may not need it because I will provide numerous handouts on all topics covered in the course.  Feel free to use an earlier edition of this book, if you can find it.

 

Recommended Text

 

American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999).  Standards for educational and psychological testing.  Washington, DC: American Educational Research Association. 

[Available for purchase at http://www.aera.net/publications/?id=313#standards.]

 

Required Readings

 

Each week you will have one or more reading assignments.  Most of these readings are listed in the bibliography that appears next.  I will distribute copies of all reading assignments free of charge.  Articles that are likely to appear as reading assignments are denoted with an asterisk (*).

 

Fundamentals of Test Construction Bibliography

           

Aiken, L. R. (1980).  Content validity and reliability of single items or questionnaires.  Educational and Psychological Measurement, 40, 955-959.

 

Almond, R. G., Steinberg, L., S., & Mislevy, R. J. (2002).  Enhancing the design and delivery of assessment systems:  A four-process architecture.  Journal of Technology, Learning, and Assessment, 1(5).  Available at http://www.bc.edu/research/intasc/jtla/journal/v1n5.shtml.

           

*American Educational Research Association, American Psychological Association, & National Council on Measurement in Education (1999).  Standards for educational and psychological testing.  Washington, DC: American Educational Research Association.

 

American Psychological Association (2001).  Publication manual of the American Psychological Association (5th edition). Washington, DC:  Author.

 

Anastasi, A. (1988). Psychological testing (6th edition). New York: Macmillan.

 

Angoff, W. H. (1984).  Scales, norms, and equivalent scores.  Princeton, NJ:  Educational Testing Service.  (Reprint of chapter In R.L. Thorndike (Ed.) Educational Measurement (2nd Edition), Washington, DC:  American Council on Education, 1971).

 

Baron, J. B. (1991).  Strategies for the development of effective performance exercises.    Applied Measurement in Education, 4, 305‑318.

 

Bennet, R., & Ward, W. (1993).  Construction versus choice in cognitive measurement.  Hillsdale, NJ: Lawrence Erlbaum Associates.

 


Berk. R. A. (Ed.), (1984).   A guide to criterion‑referenced test construction.  Baltimore:  Johns Hopkins University Press.

 

 Brennan, R. L. (2001).  Some problems, pitfalls, and paradoxes in educational measurement.  Educational Measurement:  Issues and Practice, 20(4), 6-18.

 

Burger, S. E., & Burger, D. L. (1994).  Determining the validity of performance-based assessment.  Educational Measurement:  Issues and Practice, 13(1), 9-15.

 

Chakwera, E., Khembo, D., & Sireci, S. G. (2004).  High-stakes testing in the warm heart of Africa:  The challenges and successes of the Malawi National Examinations Board.  Education Policy Analysis Archives, 12(29) (see http://epaa.asu.edu/epaa/v12n29/.

 

*Cizek, G. J. (1996).  Setting passing scores.  [An NCME instructional module].  Educational Measurement:  Issues and Practice, 15 (2), 20-31.

 

*Cizek, G. J. (2001).  More unintended consequences of high-stakes testing.  Educational Measurement:  Issues and Practice, 20 (4), 19-27.

 

Cizek, G. J. (2001).  Standard setting: Concepts, methods, and perspectives.  Mahwah, NJ:  Lawrence  Erlbaum.

 

Cizek, G. J., Bunch, M. B., & Koons, H. (2004). Setting performance standards: Contemporary methods. Educational Measurement: Issues and Practice, 23(4), 31-50.

 

Clauser, B. E., Subhiyah, R. G, Nungester, R. J., Ripkey, D. R., Clyman, S. G., McKinley, D. (1995).  Scoring a performance-based assessment by modeling the judgments of experts.   Journal of Educational Measurement, 32, 397‑415.

 

*Crocker, L. M., Miller, D., and Franks E. A.  (1989). Quantitative methods for assessing the fit between test and curriculum.  Applied Measurement in Education, 2, 179‑194.

 

Council of Chief State School Officers (1992).  Recommendations for improving the assessment and monitoring of students with limited English proficiency.  Washington, DC:  Author.

 

Cronbach, L. J. (1946).  Response sets in objective tests.  Educational and psychological measurement, 6, 475-494.

 

Cronbach, L. J. (1988).  Five perspectives on the validity argument.  In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 3-17).  Hillsdale, New Jersey: Lawrence Erlbaum.

 

Downing, S. M. (1990, April).  True-false and alternative choice formats:  A review of the research.  Paper presented at the annual meeting of the National Council on Measurement in Education, Boston, MA.

 

Downing, S. M., & Haladyna, T. M. (1997).  Test item development:  Validity evidence from quality assurance procedures.  Applied Measurement in Education, 10, 61-82.

 

Downing, S. M., & Haladyna, T. M. (Eds.). (2006).  Handbook of testing (pp. 329-347).  Mahwah, NJ:  Lawrence Erlbaum.

 

Drasgow, F., & Olson-Buchanan, J. B. (Eds.)  (1999).  Innovations in Computerized Assessment.  Mahwah, NJ: Lawrence Erlbaum.

 

*Dunbar, S. B., Koretz, D. M., & Hoover, H. D. (1991).  Quality control in the development and use of performance assessments.    Applied Measurement in Education, 4, 289‑303.

 

Dwyer, C. A. (1996).  Cut scores and testing:  Statistics, judgment, truth, and error.  Psychological Assessment, 8, 360-362.

 

Fisher, R. J. (1994).  The Americans With Disabilities Act:  Implications for measurement.  Educational Measurement:  Issues and Practice, 13(3), 17-26, 37.

 

Gallagher, J. D. (1998).  Classroom assessment for teachers.  Upper Saddle River, NJ: Merrill. 

 

Geisinger, K. F. (1994).  Cross-cultural normative assessment: Translation and adaptation issues influencing the normative interpretation of assessment instruments.  Psychological Assessment, 6, 304-312.

 

Geisinger, K. F. (1994).  Psychometric issues in testing students with disabilities.  Applied Measurement in Education, 7, 121-140.

 

Glaser, R. (1963).    Instructional technology and the measurement of learning outcomes:  Some questions.  American Psychologist, 18, 519-521.

 

Glaser, R. (1994).  Criterion-referenced tests:  Part I Origins.  Educational Measurement:  Issues and Practice, 13(4), 9-11.

 

Goldstein, H. (1994).  Recontextualizing mental measurement.  Educational Measurement: Issues and Practice, 12, 16-19, 43.

 

Haladyna, T. M. (1992).  The effectiveness of several multiple-choice item formats.  Applied Measurement in Education, 5, 73-88.

 

Haladyna, T. M. (1994).    Developing and validating multiple-choice test items.  Hillsdale:  Lawrence Erlbaum.

 

*Haladyna, T. M., & Downing, S. M. (1989).  A taxonomy of multiple-choice item writing rules.  Applied Measurement in Education, 2, 37-50.

 

Haladyna, T. M., & Shindoll, R. R. (1989).  Item shells:  A method for writing effective multiple-choice items.  Evaluation & The Health Professions, 12, 97-106.

 

Hambleton, R. K., (1984).  Validating the test score In R.A. Berk (Ed.), A guide to criterion‑referenced test construction.  Baltimore:  Johns Hopkins University Press, pp. 199-230.

 

Hambleton, R. K. (1994).  Guidelines for adapting educational and psychological tests:  A progress report.  European Journal of Psychological Assessment, 10, 229-244.

           

Hambleton, R. K., & Sireci, S. G. (1997).  Future directions for norm-referenced and criterion-referenced achievement testing.  International Journal of Educational Research, 27 (5), 379-393.

 

*Hambleton, R. K., & Zenisky, A. (2003).  Advances in criterion-referenced testing methods and practices.  In C. R. Reynolds & R. W. Kamphaus (Eds.).  Handbook of psychological and educational assessment of children (2nd Ed., pp. 377-404).

 

*Huff, K. & Goodman, Dean P.  (2007). The demand for cognitive diagnostic assessment. In Leighton, J. & Gierl, M. (Eds) Cognitive diagnostic assessment for education. Cambridge: Cambridge University Press.

 

Huff, K. L., & Sireci, S. G. (2001).  Validity issues in computer-based testing.  Educational Measurement:  Issues and Practice, 20 (3), 16-25.

 


*Joint Committee on Testing Practices (2004).  Code of Fair Testing Practices in Education.  Washington, DC:  American Psychological Association.  Available for download at http://www.apa.org/science/fairtestcode.html.

 

Koretz, D., Stecher, B., Klein, S., & McCaffrey,  D. (1994).  The Vermont portfolio assessment program:  Findings and implications.  Educational Measurement:  Issues and Practice, 13(3), 5-16.

 

Kreiter, C. D., & Frisbie, D. A. (1989).  Effectiveness of multiple true-false items.  Applied Measurement in Education, 2, 207‑216.

 

Kuehn, P. A., Stallings, W. M.,  Holland, C. L. (1990).  Court-defined job analysis requirements for validation of teacher certification tests. Educational Measurement:  Issues and Practice, 9 (4), 21-24.

 

Lane, S. (1993).  The conceptual framework for the development of a mathematics performance assessment instrument.  Educational Measurement:  Issues and Practice, 12(2), 16-23.

 

Linn, R.L. (1994).  Criterion-referenced measurement:  a valuable perspective clouded by surplus meaning.  Educational Measurement:  Issues and Practice, 13, 12-15. 

 

Linn, R. L. (2000). Assessments and accountability.  Educational Researcher, 29(2), 4-16.

 

Linn, R. L. (2003, September 1).  Performance standards:  Utility for different uses of assessments.  Educational Policy and Analysis Archives, 11(31).  Retrieved September 1, 2003 from http://epaa.asu.edu/epaa/v11n31.

 

*Linn, R. L., & Burton, E. (1994).  Performance-based assessment:  Implications of task specificity.  Educational Measurement:  Issues and Practice, 13(1), 5-8, 15.

 

Livingston, S.A. (1982).  Estimation of the conditional standard error of measurement for stratified tests.   Journal of Educational Measurement, 19, 135-138.

           

Livingston, S. A., & Zieky, M. J. (1982).  Passing scores:  A manual for setting standards of performance on educational and occupational tests.  Princeton, NJ:  Educational Testing Service.

 

Lukhele, R. Thissen, D., & Wainer, H. (1994).  On the relative value of multiple-choice, constructed response, and examinee-selected items on two achievement tests.  Journal of Educational Measurement, 31, 234-250.

 

Meara, K. P., Hambleton, R. K., & Sireci, S. G. (2001).  Setting and validating standards on professional licensure and certification exams:  A survey of current practices.  CLEAR Exam Review, 12 (2), 17-23.

 

Mentzer, T. L. (1982).  Response biases in multiple-choice test item files.  Educational and psychological Measurement, 42, 437-448.

 

Messick, S. (1989).  Validity.  In R. Linn (Ed.),  Educational measurement, (3rd ed.)  (pp. 13-103).  Washington, D.C.:   American Council on Education.

 

Millman, J., & Greene, J. (1989).  The specification and development of tests of achievement and abilities.  In R. Linn (Ed.),  Educational measurement, (3rd ed.  (pp. 335-366).  Washington, D.C.:   American Council on Education.

 

Nelson, D. S. (1994).  Job analysis for licensure and certification exams:  Science or politics?  Educational Measurement:  Issues and Practice, 13(3), 29-35.

 


Nolen, S. B., Haladyna, T. M., & Haas, N. S. (1992).  Uses and abuses of achievement test scores.  Educational Measurement:  Issues and Practice, 11(2), 9-15.           

 

Nunnally, J. C. (1978).  Psychometric theory.  New York:   MacGraw-Hill.

 

O’Neil, T., Sireci, S. G., & Huff, K. F. (2004).  Evaluating the consistency of test content across two successive administrations of a state-mandated science assessment.  Educational Assessment, 9, 129-151.

 

Osterlind, S. J. (1989).  Constructing test items. Hingham, MA:  Kluwer.

 

Pearson, P. D. & Garavaglia, D. R. (1997).  Improving the information value of performance items in large scale assessments.  Paper commissioned by the NAEP Validity Studies Panel.  Palo Alto, CA: American Institutes for Research.

 

Phillips, S. E. (1994).  High-stakes testing accommodations:  validity versus disabled rights.  Applied Measurement in Education, 7, 93-120.

 

Phye, G. D. (1997).  Handbook of classroom assessment.  San Diego, CA:  Academic Press.

 

Popham, W. J. (1992).  A tale of two test-specification strategies.  Educational Measurement:  Issues and Practice, 11(2), 16-17,22.

 

*Popham, W. J., Baker, E. L., Berliner, D. C, Yeakey, C. C., Pelligrino, J. W., Quenemoen, R. F., Roderiquez-Brown, F. V., Sandifer, P. D., Sireci, S. G., & Thurlow, M. L. (2001, October).  Building tests to support instruction and accountability:  A guide for policymakers.  Commission on Instructionally Supportive Assessment.  Available at http://www.nea.org/accountability/buildingtests.html

.

 Quellmalz, E. S. (1991).  Developing criteria for performance assessments:  The missing link.   Applied Measurement in Education, 4, 319-331.

 

Reckase, M. D. (1995).  Portfolio assessment:  A theoretical estimate of score reliability.  Educational Measurement:  Issues and Practice, 14(1),12-14, 31.

 

Sands, W. A., Waters, B. K. & McBride, J. R. (Eds.). (1997).  Computerized adaptive testing: From inquiry to operation.  Washington, DC: American Psychological Association.

 

Shavelson, R. J., Baxter, G., & Pine, J. (1991).  Performance assessment in science.  Applied Measurement in Education, 4, 347‑362.

 

*Sireci, S.G. (1998).  Gathering and analyzing content validity data.  Educational Assessment,5, 299-321.

 

Sireci, S. G. (1998).  The construct of content validity.  Social Indicators Research.

 

*Sireci, S.G. (2005).  The most frequently unasked questions about testing.  In R. Phelps (Ed.), Defending standardized testing (pp. 111-121).  Mahwah, NJ:  Lawrence Erlbaum.

 

Sireci, S. G. (2005).  Unlabeling the disabled:  A perspective on flagging scores from accommodated test administrations.  Educational Researcher, 34(1), 3-12.

 

Sireci, S.G. (2003). Content validity. Encyclopedia of psychological assessment (pp. 1075-1077). London: Sage.

 

*Sireci, S. G. (2003).  Validity.  Encyclopedia of psychological assessment (pp. 1067-1069).London:  Sage.

 


*Sireci, S. G. (2004).  Computerized-adaptive testing:  An introduction.  In J. Wall and G. Walz (Eds).  Measuring up:  Assessment issues for teachers, counselors, and administrators (pp. 685-6947). Greensboro, NC:  CAPS Press.

 

Sireci, S. G., DeLeon, B., & Washington, E. (2002, Spring).  Improving teachers of minority students’ attitudes towards and knowledge of standardized tests.  Academic Exchange Quarterly, 162-167.

 

Sireci, S.G., & Geisinger, K.F. (1998).  Equity issues in employment testing.  In  J.H. Sandoval, C. Frisby, K.F. Geisinger, J. Scheuneman, & J. Ramos-Grenier (Eds.).  Test interpretation and diversity (pp. 105-140).  American Psychological Association:  Washington, D.C.

 

Sireci, S.G., & Green, P.C. (2000).  Legal and psychometric criteria for evaluating teacher certification tests.  Educational Measurement: Issues and Practice, 19(1), 22-31, 34.  

 

Sireci, S. G., Hambleton, R. K., & Pitoniak, M. J. (2004).  Setting passing scores on licensure exams using direct consensus.  CLEAR Exam Review 15(1), 21-25.

 

*Sireci, S. G., & Mullane, L. A. (1994).  Evaluating test fairness in licensure testing:  the sensitivity review process.  CLEAR Exam Review, 5 (2) 22‑28.

 

Sireci, S. G., & Parker, P. (2006).  Validity on trial:  Psychometric and legal conceptualizations of validity.  Educational Measurement:  Issues and Practice, 25(3), 27-34.

 

Sireci, S. G., & Pitoniak, M. J. (2007).  Assessment accommodations:  What have we learned from research?  Large scale assessment and accommodations:  What works?  In C. C. Laitusis & L. Cook (Eds.) (pp. 53-65). 

 

 Sireci, S.G., Robin, F., & Patelis, T. (1999).  Using cluster analysis to facilitate standard setting.  Applied Measurement in Education, 12, 301-325.

 

Sireci, S. G., Scarpati, S., & Li, S. (2005).  Test accommodations for students with disabilities:  An analysis of the interaction hypothesis.  Review of Educational Research, 75, 457-490.


 

Sireci, S. G., Thissen, D., & Wainer, H. (1991).  On the reliability of testlet-based tests.  Journal of Educational Measurement, 28, 237-247.

 

 Sireci, S. G., Wainer, H., & Braun, H. (1998).  Psychometrics, overview.  In P. Armitage, & T. Colton (Eds.), Encyclopedia of Biostatistics,  London:  John Wiley & Sons.

 

Sireci, S.G, Wiley, A., & Keller, L.A. (2002). An empirical evaluation of selected multiple-choice item writing guidelines.  CLEAR Exam Review, 13(2), 20-26.

 

Sireci, S. G., & Zenisky, A. L. (2006).  Innovative item formats in computer-based testing:  In pursuit of improved construct representation.  In S.M. Downing and T.M. Haladyna (Eds.), Handbook of Testing (pp. 329-347).  Mahwah, NJ:  Lawrence Erlbaum.

 

Smith, I. L., & Hambleton, R. K. (1990).  Content validity studies of licensing examinations.   Educational Measurement:  Issues and Practice, 9(4), 7-10.

 

Stiggins, R. J. (1997).  Student-centered classroom assessment.  New York:  Merrill.

 

Thissen, D., Wainer, H., & Wang, X-B.  (1994).  Are tests comprising both multiple-choice and free-response items necessarily less unidimensional than multiple-choice tests?  An analysis of two tests.  Journal of Educational Measurement, 31, 113-123.

 

*Thompson, S, & Thurlow, M. (2002, June).  Universally designed assessments:  Better tests for everyone!  Policy Directions, Number 14.  Minneapolis, MND:  National Center on Educational Outcomes.

 

Thorndike, R. L. (1982).  Applied psychometrics.  Boston: Houghton Mifflin.

           

*Wainer, H. (1989).  The future of item analysis. Journal of Educational Measurement, 26, 191-208.

 

*Wainer, H.  (1993).  Some practical considerations when converting a linearly administered test to an adaptive format.  Educational Measurement: Issues and Practice, 12, 15-20.

 

Wainer, H. (Ed.).  (2000).  Computerized adaptive testing: A primer (2nd edition).  Hillsdale, NJ: Lawrence Erlbaum.

 

Wainer, H., & Braun, H. (1988).  Test validity.  Lawrenceville, NJ: Erlbaum.

 

Wainer, H., & Kiley, G. L. (1987).  Item clusters and computerized adaptive testing:  A case for testlets.   Journal of Educational Measurement, 24, 185-201.

 

Wainer, H., & Sireci, S. G. (2005).  Item and test bias.  Encyclopedia of social measurement volume 2, 365-371.  San Diego:  Elsevier.

 

Wall, J. E., & Walz, G. R. (Eds.) (2004).  Measuring up:  Assessment Issues for teachers, counselors, and administrators.  Greensboro, NC:  CAPS Press.

 

Williamson, D. M., Mislevy, R. J., & Almond, R. G. (2004).  Evidence-centered design for certification and licensure.  CLEAR Exam Review, 15(2), 14-18.

 

*Zenisky, A.L., & Sireci, S.G. (2002).  Technological innovations in large-scale assessment.  Applied Measurement in Education, 15, 337-362.

 

Plagiarism policy:

Direct copying of someone else=s work is not allowed.  Printing out someone else=s computer output, and handing it in as your own work, is also not allowed.  Passing off someone else=s work as your own will result in failing this course.  Please see me if you have questions about this policy, or if you have trouble completing any assignments.

 

Accommodation policy:

 

I strive to provide an equal educational opportunity for all students.  If you have a physical, psychological, or learning disability, you may be eligible for academic accommodations to help you succeed in this course.  If you have a documented disability that requires an accommodation, please notify me as soon as possible, but no later than the third class, so that we may make appropriate arrangements to provide any needed accommodations for class or assignments.

 

 

 


TENTATIVE CLASS SCHEDULE FOR FALL 2007

 

Listed below are the topics that will be covered as well as a list of suggested readings. 

The dates listed for each topic are tentative. 

 

Class

Topics

Readings

9/4

Purposes of Educational Tests

Standards for Teacher Competence in Assessment

Norm- and Criterion-referenced testing

Cizek (2001)

Text Chs. 1-2, & Appendix D

 

9/11

Reliability and Validity Fundamentals

Planning the Development of a Test

Sireci (2005), Text Chs., 3-5

Hambleton & Zenisky (2003)

9/18

Defining Test Content and Developing Test Specifications

Text Ch. 6

9/25

Assessment Format Options

Writing Multiple-Choice (MC) Items

Text Ch. 8

Haladyna & Downing (1989)

10/2

Writing MC Items (continued)

Writing Other Objectively Scored Items

Haladyna (1992); Text Ch. 7

10/9

No Class—Monday Schedule @ UMASS

 

10/16

Developing Performance Assessments

Text Chs. 10-11; Dunbar et al. (1991);

Linn & Burton (1994)

10/23

Scoring Performance Assessments

Ch. 11, Handouts

10/30

Evaluating Tests for Content Validity

Sensitivity Review

Crocker, et al. (1989); Sireci (1998)

Sireci & Mullane (1994)

11/6

Field Testing and Item Analysis

Text Ch. 14, Handouts

Wainer (1989)

11/13

Incorporating Meaning Into the Test Score Scale

Setting Standards on Educational Tests

Text Ch. 19

Cizek (1996)

11/20

Evidence-Centered Test Design

Huff & Goodman (2007)

11/27

Understanding Test Results

Developing a Technical Manual

 

AERA, APA, & NCME (1999)

12/4

Computer-based testing

Sireci (2004); Sireci & Zeniksy (2005)

Wainer (1993); Zenisky & Sireci (2002)

12/11

Test Accommodations and Universal Test Design

Portfolio Assessment

Geisinger (1994); Phillips (1994)

Thompson & Thurlow (2002); Text Ch. 12

Koretz et al. (1994); Reckase (1995),

12/21

Final Test Form and Technical Manual Due (no class)