# Reliability and validity

Using what you learned in the Reliability and Validity Exercise, as well as in the Evaluating Assessments for Reliability and Validity media, discuss the significance of reliability and validity in test creation.

What is Reliability?

Reliability is the degree to which a given tool produces consistent outcomes or scores over time. If an assessment is reliable, we would get the same results no matter how many times we use the assessment based on the same conditions.

For example, we measure variables and want to know how constant these variables are when they are measured. If it is constant, then it should produce the same results (or nearly the same) if used with the same individual under the same conditions.

There are different types of reliability. Here are a few:

· Test-retest Reliability: As the name suggests, this measure is obtained by administering the same test twice over a set period of time to a set group of subjects. The scores from the two tests are correlated in order to evaluate the instrument’s stability over time.

· Parallel forms Reliability: This measure of reliability is obtained by administering different versions of the test that both contain items addressing the same issue or construct. The scores are correlated in order to evaluate consistency between versions.

· Inter-Rater Reliability: This measure explores the degree to which different raters agree in their assessment using the tool.

A reliability coefficient is the level of consistency of scores across tests. Reliability coefficients range from 0 to 1.00. The closer the coefficient is to 1.00, the higher the reliability. A reliability of .90 indicates true variance of 90%. Since .90 is close to 1.00, the coefficient suggests good reliability or consistency.

An assessment must be reliable for results to be valid.

What is Validity?

Validity means that an assessment accurately measures what is says it measures. It is then a valid assessment.

For example, a motivation assessment must measure motivation and not other variables. Based on this, validity provides information on when it is appropriate to use an assessment.

Results must be aligned with the objectives of the test for the assessment to be considered valid.

There are different types of validity. Here are a few:

· Concurrent Validity: The findings of the instrument in question are compared with findings from another instrument that has been determined to be valid to see how well the one instrument correlates to the second instrument.

·
Content Validity: The degree to which the instrument measures all facets of a given construct.

·
Convergent Validity: The degree to which two measures of constructs that are expected to be related, are related.

·
Criterion Validity: the degree to which the measured construct is related to a given outcome.

·
Face Validity: The degree to which an instrument appears to measure what it professes to measure.

·
Predictive Validity: The degree to which the provided measure compares to an outcome that is measured at a later point.

Review

Let’s look at a review of the Beck Scale for Suicide Ideation and review reliability and validity.

Beck A.T., & Steer R. A. (1991). Beck Scale for Suicide Ideation. Hanes KR, Stewart JR, eds. January 1991. Retrieved from http://library.capella.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=mmt&AN=test.1706&site=ehost-live&scope=site. Accessed October 11, 2018.

Review of the Beck Scale for Suicide Ideation by KARL R. HANES, Consultant Psychologist, and Director, Vangard Publishing, Carlton, Victoria, Australia:

The Beck Scale for Suicide Ideation (BSS) is a 21-item self-report instrument that was developed as a self-administered variation of the earlier Scale for Suicidal Ideation (SSI; Beck, Kovacs, & Weissman, 1979). The BSS is well constructed; easy to administer; has high face, convergent, and construct validity; and scores from this scale have acceptable levels of internal consistency. The test is appropriate for use with English-speaking populations of average intelligence. The manual does provide clear instructions on administration and scoring of this instrument, although the significance of specific responses or of actual scores on this measure is not explored in any great detail.

Despite this, there are a few specific points of concern. The manual provides scant details regarding the development of this measure, appearing to rely on the longevity and status of its parent instrument. This is not very helpful for those who are not familiar with the earlier measure. Some further information regarding the specific items on this test and background material on test-relevant previous research in this field would be helpful. Moreover, the requirement of a self-administered variation of the SSI is not convincingly demonstrated, with the statement that the authors ‘saw a need for a self-report version of the SSI’ (manual, p. 3) not being terribly informative. Providing such a rationale would seem to be of particular importance in this case, given the strong requirement for clinical evaluation and the strong possibility of concealment and provision of misleading information in those at high risk for suicide.

STANDARDIZATION. The BSS was standardized on 178 adults receiving psychiatric services and identified as suicide ideators (individuals who have current plans and wishes to commit suicide, but no recent suicide attempts); 126 were inpatients of a suburban general hospital and 52 were receiving outpatient psychiatric services. The inpatient sample, mean age 37.4 years, consisted of 50% females, 81% Caucasians, 15% African Americans, and 4% Asian Americans. This sample had a variety of severe mental disorders with 40% having a mood disorder diagnosis. The outpatient sample, mean age 34.1 years, consisted of 60% females, 88% Caucasians, and 12% African Americans.

RELIABILITY. Reliability data are limited to the first 19 items on the BSS. The inpatient sample produced a coefficient alpha reliability estimate of .90, and the outpatient sample produced an alpha of .87, indicating high internal consistency for both samples. Test-retest stability was performed on 60 inpatients and a correlation of .54 (p<.01) was found between tests administered one week apart, indicating moderate test-retest reliability. For a sample of 108 adolescent inpatients, ages 12 to 17, a coefficient alpha of .95 was estimated, indicating a high level of internal consistency (Steer, Kumar, & Beck, 1993).

VALIDITY. The validity of the BSS has correlated with the SSI .90 (p<.01) for an inpatient sample and .94 (p<.001) for an outpatient sample, providing concurrent validity for the BSS. Additional concurrent validity was ascertained through an inpatient suicide ideators sample on the Beck Depression Inventory (BDI) .48 (p<.01), Beck Hopelessness Scale (BHS) .48 (p<.001), and previous suicide attempts .32 (p<.01). A sample of outpatient suicide ideators produced similar results.

Construct validity was investigated through the significant correlation with assessment scores on the BDI and BHS. Depression and hopelessness were theorized to be factors in suicide risk. Through principal component analysis of the BSS scores of 126 suicide ideators, Beck, Kovacs, and Weissman (1979) identified and labeled three significant factors: (a) Active Suicide Desire, (b) Preparation, and (c) Passive Suicide Desire.

A prospective study by Beck, Steer, Kovacs, and Garrison (1985) found that the BHS did, but the SSI did not, predict eventual suicide. In another study (Beck & Steer, 1991), outpatients were asked to complete the SSI as if they were in the worst period of their mental disorder. SSI scores for those who eventually committed suicide were significantly higher than those who did not.