Wednesday, February 4, 2009

Chapter 2: The new rules of measurement

Word document version

Rule 1: The standard error of measurement (SEM)

  • Old Rule 1: The standard error of measurement applies to all scores in a particular population
  • New Rule 1: The standard error of measurement differs across scores but generalizes across populations.

SEM for CTT is constant while that of IRT is variable.

CTT

Standard error of measurement for CTT:

SEM= (1-rtt)1/2σ

The estimated true score is derived from a linear transformation of raw score and the confidence interval also represented as straight lines.

IRT

The relationship between raw score and transformed score is non-linear. In addition, the confidence interval becomes wider at the extreme values.

Rule 2: Test length and reliability

  • Old Rule 2: Longer tests are more reliable than shorter tests
  • New Rule 2: Shorter tests can be more reliable than longer tests

CTT

Spearman-Brown prophecy formula:

Given rtt is the reliability for the original test and n the number of parallel parts

rnn=

An adaptive test, by nature fails to meet the assumptions because the test difficulties vary substantially.

Rule 3: Interchangeable Test Forms

  • Old Rule 3: Comparing test scores across multiple forms is optimal when test forms are parallel
  • New Rule 3: Comparing test scores across multiple forms is optimal when test difficulty levels vary between persons.

Gulliksen (1950) defined strict conditions for test parallelism in his exposition of CTT:

  1. Equality of means and variances across test forms
  2. Equality of covariance with external variables

Rule 4: Unbiased Assessment of Item Properties

  • Old Rule 4: Unbiased assessment of item properties depends on having representative samples
  • New Rule 4: Unbiased estimates of item properties may be obtained from unrepresentative samples

CTT

  • Item difficulty is p-value or the proportion of passing
  • Item discrimination is item-total correlation (e.g., biserial correlation)

Rule 5: Establishing Meaningful Scale Scores

  • Old Rule 5: Test scores obtain meaning by comparing their position in a norm group
  • New Rule 5: Test score obtain meaning by comparing their distance from items.

Rule 6: Establishing Scale Properties

  • Old rule 6: Interval scale properties are achieved by obtaining normal score distributions
  • New Rule 6: Interval scale properties are achieved by applying justifiable measurement models

Rule 7: Mixing Item Formats

  • Old Rule 7: Mixed item formats leads to unbalanced impact on test total scores.
  • New Rule 7: Mixed item formats can yield optimal test scores.

CTT

Z score

Rule 8: The Meaning of Change Scores

  • Old Rule 8: Change scores cannot be meaningfully compared when initial score levels differ.
  • New Rule 8: Change scores can be meaningfully compared when initial score levels differ.

XJ,Change=XJ2-XJ1

Beteiter (1963)- 3 fundamental problems with change scores:

  1. Paradoxical reliabilities, such that the lower the pretest to posttest correlations, the higher the change score reliability
  2. Spurious negative correlations between initial status and change (due to the subtraction
  3. Different meaning from different initial levels

Rule 9: Factor Analysis of Binary Items

  • Old Rule 9: Factor analysis on binary items produces artifacts rather than factors
  • New Rule 9: Factor analysis on raw item data yields a full information factor analysis.


  1. Phi correlation
  2. Tetrachoric correlation
  3. Full information factor analysis

Rule 10: Importance of Item Stimulus Features

  1. Old Rule 10: Item stimulus features are unimportant compared to psychometric properties
  2. New Rule 10: Item stimulus features can be directly related to psychometric properties.


This chapter should be in the end of the book.

No comments:

Post a Comment