Skip to main content
Logo Rehabilitation Measures Database Logo
The Rehabilitation Clinician's Place to Find the Best Instruments to Screen Patients and Monitor Their Progress

The following is a basic review of statistical terms used on the Rehabilitation Measures website 

Educational Resources: View our 2012 RRF webinar: Fundamentals of Measurement in Older Adults
International Classification of Functioning, Disability, and Health (ICF Domain)

Categorizes assessments into:

  • Body Function
  • Body Structure
  • Activity
  • Participation
  • Environmental Factor
  • Personal Factor
For more information consult the WHO's ICF Framework
Cut-Off Scores
A cut-off score designates a positive or negative test outcome.  This information could be used to classify individuals into groups such as minimum, moderate or severe impairment.
For example, a cut-off score could represent the maximum score an individual could achieve on a test and be classified as having a risk of falls (i.e., < 48 indicates patient is at risk for falls). 
Normative Data
Normative data represent scores pulled from published literature.  Normative data provides "normal" values for specific variables within a population.  This type of research typically appears in validation studies and therefore may not represent the full range of outcomes clinicians may encounter; however, this data can provide approximate guidelines.  Whenever possible, normative data is presented with data collected from other measures researchers or clinicians have used in the course of their work.
Face Validity

An assumption that an instrument is valid based on its appearance ( is a reasonable measure of the variable being assessed). 



These reflect potential considerations users should keep in mind when using an instrument.  Often measurements within particular diagnostic populations come with unique assumptions.  These should be kept in mind, particularly if the measure has only limited use in the population of interest.  

Psychometrics Review:

Standard Error of Measurement (SEM)

The Standard Error of Measurement (SEM) is a reliability measure that assesses response stability.  The SEM estimates the standard error in a set of repeated scores.

Clinical Bottom Line:  The SEM is the amount of error that you can consider as measurement error.

In the Rehabilitation Measures Database, the SEM was frequently pulled directly from peer reviewed journal articles.  However, whenever the statistics were available in the published articles, the following equation was utilized to calculate the SEM:

SEM = Standard Deviation from the 1st test x (square root of (1-ICC))

SEM is a measurement error in the units used in the measurement.
Minimal Detectable Change (MDC)

A statistical estimate of the smallest amount of change that can be detected by a measure that corresponds to a noticeable change in ability. 

Clinical Bottom Line: The MDC is the minimum amount of change in a patient's score that ensures the change isn't the result of measurement error.

In the Rehabilitation Measures Database, the MDC was frequently pulled directly from peer reviewed journal articles.  However, whenever the statistics were available in the published articles, the following equation was utilized to calculate the MDC:

MDC = 1.96 x  SEM x square root of 2

The MDC is calculated in terms of confidence of predication. For example, MDC95 is based on a 95% confidence interval, while a MDC90 is based on a 90% confidence interval.  Anytime a MDC was calculated for the Rehabilitation Measures Database, the MDC95 was used.
Minimal Clinically Important Difference (MCID)

MCID represents the smallest amount of change in an outcome that might be considered important by the patient or clinician. 


Clinical Bottom Line: The MCID is a published value of change in an instrument that indicates the minimum amount of change required for your patient to feel a difference in the variable you are measuring.

The MCID is typically quantified in the units used in the measurement.
Test-retest Reliability
Establishes that an instrument is capable of measuring a variable with consistency.

Clinical Bottom Line: If you are planning to use an instrument for individual decision-making, it is recommended that you use an instrument with an ICC > 0.9.

If you are planning to use the instrument to measure progress of a large group (as in research), an instrument with an ICC > 0.7 is acceptable. 

Interrater Reliability
Determines variation between two or more raters who  measure the same group of subjects.

Excellent Reliability: ICC > 0.75

Adequate Reliability: ICC 0.40  to < 0.74

Poor Reliability: ICC < 0.40

Intrarater Reliability

Determines stability of data recorded by one individual across two or more trials.


See Interrater Reliability Criteria


Internal Consistency

The extent to which items in the same instrument all measure the same trait. Typically measured using Cronbach's alpha.


Excellent: Cronbach's alpha > .8

Adequate:  Cronbach's alpha < .8 and >.7

Poor: Cronbach’s alpha <.7

*Scores higher than .9 may indicate redundancy in the scale questions.

Predictive Validity
Indicates that the outcomes of an instrument predict a future state or outcome.

Excellent:  correlation coefficient > 0.6
Adequate: correlation coefficient 0.31 - 0.59
Poor:  correlation coefficient < 0.30

Receiver Operating Characteristic (ROC) analysis - area under the curve

Excellent: > 0.9
Adequate: 0.7 - 0.89
: < 0.7

Concurrent Validity

Establishes validity when two measures are taken at relatively the same time, often indicates that the test could be used instead of a gold-standard.


See Predictive Validity Criteria
Convergent Validity

Convergent validity refers to the degree to which two measures demonstrate similar results.  For example, a new measure may assess gait speed using a new technique.  Validation of this new measure would include outcomes obtained from established measures of gait speed.  The degree to which these two assessments of gait speed converge provides evidence of the new measure's validity.

See Predictive Validity Criteria
Discriminant Validity

Discriminant validity is the degree to which two or more measures, assessing theoretically different constructs, demonstrate a difference in outcomes.  Discriminate validity evidence is commonly gathered during test validation to ensure that two or more measures are NOT assessing the same underlying trait or dimension.  

Clinical Bottom Line: High correlations between measures (greater than .90) indicate the measures are assessing the same domain and may be redundant.

See Predictive Validity Criteria
Content Validity
The items that make up an instrument adequately sample the universe of possible items that compose the construct being measured.

Typically assessed by measuring agreement between Subject Matter Experts (SME), although several other techniques can also be used. 

Construct Validity

Establishes the ability of an instrument to measure an abstract concept and the degree to which the instrument reflects the theoretical components of it.  Includes convergent and discriminant validity.


Construct validity is assessed using several lines of evidence including Content, Construct and Criteria related validity.  Construct validity is a property of the inferences regarding the use of a measure as opposed to a property of the measure itself.

Floor Effects

Floor effects occur when a measure’s lowest score is unable to assess a patient’s level of ability.  For example a measure that assesses caregiver depression may not be sensitive enough to assess low or intermittent levels of depression among caregivers.

Excellent:  No floor effects

Adequate: Floor effects < 20%

Poor:  Floor effects for > 20%

Ceiling Effects

Ceiling effects occur when a measure’s highest score is unable to assess a patient’s level of ability.  This might be particularly common for measures used over multiple occasions.  For example, a patient’s pre-rehab score may be in-range at the initial evaluation, but the patient’s ability exceeds the measure's highest score over time.  Therefore, it is unable to accurately assess progress as the patient improves.

Excellent:  No ceiling effects

Adequate: Ceiling effects < 20%

Poor: Ceiling effects > 20%


Andresen, E. M. (2000). "Criteria for assessing the tools of disability outcomes research." Arch Phys Med Rehabil 81(12 Suppl 2): S15-20. Find it on PubMed

Fitzpatrick, R., Davey, C., et al. (1998). "Evaluating patient-based outcome measures for use in clinical trials." Health Technol Assess 2(14): i-iv, 1-74. Find it on PubMed

Portney, L., Watkins, M., et al. (2000). Foundations of clinical research: applications to practice, Prentice Hall Upper Saddle River, NJ.

Standards of validity and the validity of standards in performance assessment. Messick, Samuel; Educational Measurement: Issues and Practice, Vol 14(4), Win, 1995.

The contents of this database were developed under a grant from the Department of Education, NIDRR grant number H133B090024 (PI: Allen Heinemann, PhD).  However, the content does not necessarily represent the policy of the Department of Education, and you should not assume endorsement by the Federal Government.  





© 2010 Rehabilitation Institute of Chicago