Rule 1: The standard error of measurement (SEM)
- Old Rule 1: The standard error of measurement applies to all scores in a particular population
- New Rule 1: The standard error of measurement differs across scores but generalizes across populations.
SEM for CTT is constant while that of IRT is variable.
CTT
Standard error of measurement for CTT:
SEM= (1-rtt)1/2σ
The estimated true score is derived from a linear transformation of raw score and the confidence interval also represented as straight lines.
IRT
The relationship between raw score and transformed score is non-linear. In addition, the confidence interval becomes wider at the extreme values.
Rule 2: Test length and reliability
- Old Rule 2: Longer tests are more reliable than shorter tests
- New Rule 2: Shorter tests can be more reliable than longer tests
CTT
Spearman-Brown prophecy formula:
Given rtt is the reliability for the original test and n the number of parallel parts
rnn=
An adaptive test, by nature fails to meet the assumptions because the test difficulties vary substantially.
Rule 3: Interchangeable Test Forms
- Old Rule 3: Comparing test scores across multiple forms is optimal when test forms are parallel
- New Rule 3: Comparing test scores across multiple forms is optimal when test difficulty levels vary between persons.
Gulliksen (1950) defined strict conditions for test parallelism in his exposition of CTT:
- Equality of means and variances across test forms
- Equality of covariance with external variables
Rule 4: Unbiased Assessment of Item Properties
- Old Rule 4: Unbiased assessment of item properties depends on having representative samples
- New Rule 4: Unbiased estimates of item properties may be obtained from unrepresentative samples
CTT
- Item difficulty is p-value or the proportion of passing
- Item discrimination is item-total correlation (e.g., biserial correlation)
Rule 5: Establishing Meaningful Scale Scores
- Old Rule 5: Test scores obtain meaning by comparing their position in a norm group
- New Rule 5: Test score obtain meaning by comparing their distance from items.
Rule 6: Establishing Scale Properties
- Old rule 6: Interval scale properties are achieved by obtaining normal score distributions
- New Rule 6: Interval scale properties are achieved by applying justifiable measurement models
Rule 7: Mixing Item Formats
- Old Rule 7: Mixed item formats leads to unbalanced impact on test total scores.
- New Rule 7: Mixed item formats can yield optimal test scores.
CTT
Z score
Rule 8: The Meaning of Change Scores
- Old Rule 8: Change scores cannot be meaningfully compared when initial score levels differ.
- New Rule 8: Change scores can be meaningfully compared when initial score levels differ.
XJ,Change=XJ2-XJ1
Beteiter (1963)- 3 fundamental problems with change scores:
- Paradoxical reliabilities, such that the lower the pretest to posttest correlations, the higher the change score reliability
- Spurious negative correlations between initial status and change (due to the subtraction
- Different meaning from different initial levels
Rule 9: Factor Analysis of Binary Items
- Old Rule 9: Factor analysis on binary items produces artifacts rather than factors
- New Rule 9: Factor analysis on raw item data yields a full information factor analysis.
- Phi correlation
- Tetrachoric correlation
- Full information factor analysis
Rule 10: Importance of Item Stimulus Features
- Old Rule 10: Item stimulus features are unimportant compared to psychometric properties
- New Rule 10: Item stimulus features can be directly related to psychometric properties.
This chapter should be in the end of the book.
No comments:
Post a Comment