Reviewing assessment tools practice

If you are looking for affordable, custom-written, high-quality, and non-plagiarized papers, your student life just became easier with us. We are the ideal place for all your writing needs.

Order a Similar Paper Order a Different Paper

 After you have read Arbisi’s and Farmer’s reviews of the Beck Depression Inventory-II (BDI-II), found in the Week 2: Reviewing Assessment Tools PracticeLinks to an external site. reading list, compare each author’s evaluation of assessment applicability to specific populations. What weaknesses are revealed about the development of the test? To what degree do these issues warrant cautions regarding the use of the BDI-II with diverse populations? Refer to the code of ethics for your profession (ACA Code of Ethics, AAMFT Code of Ethics, or ASCA Code of Ethics) in your response. 

Beck Depression Inventory–Second Edition

Review of the Beck Depression Inventory-II by PAUL A. ARBISI, Minneapolis VA Medical Center, Assistant Professor Department of Psychiatry and Assistant Clinical Professor Department of Psychology, University of Minnesota, Minneapolis, MN:

After over 35 years of nearly universal use, the Beck Depression Inventory (BDI) has undergone a major revision. The revised version of the Beck, the BDI-II, represents a significant improvement over the original instrument across all aspects of the instrument including content, psychometric validity, and external validity. The BDI was an effective measure of depressed mood that repeatedly demonstrated utility as evidenced by its widespread use in the clinic as well as by the frequent use of the BDI as a dependent measure in outcome studies of psychotherapy and antidepressant treatment (Piotrowski & Keller, 1989; Piotrowski & Lubin, 1990). The BDI-II should supplant the BDI and readily gain acceptance by surpassing its predecessor in use.

Despite the demonstrated utility of the Beck, times had changed and the diagnostic context within which the instrument was developed had altered considerably over the years (Beck, Ward, Mendelson, Mock, & Erbaugh, 1961). Further, psychometrically, the BDI had some problems with certain items failing to discriminate adequately across the range of depression and other items showing gender bias (Santor, Ramsay, & Zuroff, 1994). Hence the time had come for a conceptual reassessment and psychometrically informed revision of the instrument. Indeed, a mid-course correction had occurred in 1987 as evidenced by the BDI-IA, a version that included rewording of 15 out of the 21 items (Beck & Steer, 1987). This version did not address the limited scope of depressive symptoms of the BDI nor the failure of the BDI to adhere to contemporary diagnostic criteria for depression as codified in the DSM-III. Further, consumers appeared to vote with their feet because, since the publication of the BDI-IA, the original Beck had been cited far more frequently in the literature than the BDI-IA. Therefore, the time had arrived for a major overhaul of the classic BDI and a retooling of the content to reflect diagnostic sensibilities of the 1990s.

In the main, the BDI-II accomplishes these goals and represents a highly successful revamping of a reliable standard. The BDI-II retains the 21-item format with four options under each item, ranging from not present (0) to severe (3). Relative to the BDI-IA, all but three items were altered in some way on the BDI-II. Items dropped from the BDI include body image change, work difficulty, weight loss, and somatic preoccupation. To replace the four lost items, the BDI-II includes the following new items: agitation, worthlessness, loss of energy, and concentration difficulty. The current item content includes: (a) sadness, (b) pessimism, (c) past failure, (d) loss of pleasure, (e) guilty feelings, (f) punishment feelings, (g) self-dislike, (h) self-criticalness, (i) suicidal thoughts or wishes, (j) crying, (k) agitation, (l) loss of interest, (m) indecisiveness, (n) worthlessness, (o) loss of energy, (p) changes in sleeping pattern, (q) irritability, (r) changes in appetite, (s) concentration difficulty, (t) tiredness or fatigue, and (u) loss of interest in sex. To further reflect DSM-IV diagnostic criteria for depression, both increases and decreases in appetite are assessed in the same item and both hypersomnia and hyposomnia are assessed in another item. And rather than the 1-week time period rated on the BDI, the BDI-II, consistent with DSM-IV, asks for ratings over the past 2 weeks.

The BDI-II retains the advantage of the BDI in its ease of administration (5-10 minutes) and the rather straightforward interpretive guidelines presented in the manual. At the same time, the advantage of a self-report instrument such as the BDI-II may also be a disadvantage. That is, there are no validity indicators contained on the BDI or the BDI-II and the ease of administration of a self-report lends itself to the deliberate tailoring of self-report and distortion of the results. Those of us engaged in clinical practice are often faced with clients who alter their presentation to forward a personal agenda that may not be shared with the clinician. The manual obliquely mentions this problem in an ambivalent and somewhat avoidant fashion. Under the heading, “Memory and Response Sets,” the manual blithely discounts the potential problem of a distorted response set by attributing extreme elevation on the BDI-II to “extreme negative thinking” which “may be a central cognitive symptom of severe depression rather than a response set per se because patients with milder depression should show variation in their response ratings” (manual, p. 9). On the other hand, later in the manual, we are told that, “In evaluating BDI-II scores, practitioners should keep in mind that all self-report inventories are subject to response bias” (p. 12). The latter is sound advice and should be highlighted under the heading of response bias.

The manual is well written and provides the reader with significant information regarding norms, factor structure, and notably, nonparametric item-option characteristic curves for each item. Indeed the latter inclusion incorporates the latest in item response theory, which appears to have guided the retention and deletion of items from the BDI (Santor et al., 1994).

Generally the psychometric properties of the BDI-II are quite sound. Coefficient alpha estimates of reliability for the BDI-II with outpatients was .92 and was .93 for the nonclinical sample. Corrected item-total correlation for the outpatient sample ranged from .39 (loss of interest in sex) to .70 (loss of pleasure), for the nonclinical college sample the lowest item-total correlation was .27 (loss of interest in sex) and the highest (.74 (self-dislike). The test-retest reliability coefficient across the period of a week was quite high at .93. The inclusion in the manual of item-option characteristic curves for each BDI-II item is of noted significance. Examination of these curves reveals that, for the most part, the ordinal position of the item options is appropriately assigned for 17 of the 21 items. However, the items addressing punishment feelings, suicidal thought or wishes, agitation, and loss of interest in sex did not display the anticipated rank order indicating ordinal increase in severity of depression across item options. Additionally, although improved over the BDI, Item 10 (crying) Option 3 does not clearly express a more severe level of depression than Option 2 (see Santor et al., 1994). Over all, however, the option choices within each item appear to function as intended across the severity dimension of depression.

The suggested guidelines and cut scores for the interpretation of the BDI-II and placement of individual scores into a range of depression severity are purported to have good sensitivity and moderate specificity, but test parameters such as positive and negative predictive power are not reported (i.e., given score X on the BDI-II, what is the probability that the individual meets criteria for a Major Depressive Disorder, of moderate severity?). According to the manual, the BDI-II was developed as a screening instrument for major depression and, accordingly, cut scores were derived through the use of receiver operating characteristic curves to maximize sensitivity. Of the 127 outpatients used to derive the cut scores, 57 met criteria for either single-episode or recurrent major depression. The relatively high base rate (45%) for major depression is a bit unrealistic for nonpsychiatric settings and will likely serve to inflate the test parameters. Cross validation of the cut scores on different samples with lower base rates of major depression is warranted due to the fact that a different base rate of major depression may result in a significant change in the proportion of correct decisions based on the suggested cut score (Meehl & Rosen, 1955). Consequently, until the suggested cut scores are cross validated in those populations, caution should be exercised when using the BDI-II as a screen in nonpsychiatric populations where the base rate for major depression may be substantially lower.

Concurrent validity evidence appears solid with the BDI-II demonstrating a moderately high correlation with the Hamilton Psychiatric Rating Scale for Depression-Revised (r = .71) in psychiatric outpatients. Of importance to the discriminative validity of the instrument was the relatively moderate correlation between the BDI-II and the Hamilton Rating Scale for Anxiety-Revised (r = .47). The manual reports mean BDI-II scores for various groups of psychiatric outpatients by diagnosis. As expected, outpatients had higher scores than college students. Further, individuals with mood disorders had higher scores than those individuals diagnosed with anxiety and adjustment disorders.

The BDI-II is a stronger instrument than the BDI with respect to its factor structure. A two-factor (Somatic-Affective and Cognitive) solution accounted for the majority of the common variance in both an outpatient psychiatric sample and a much smaller nonclinical college sample. Factor Analysis of the BDI-II in a larger nonclinical sample of college students resulted in Cognitive-Affective and Somatic-Vegetative main factors essentially replicating the findings presented in the manual and providing strong evidence for the overall stability of the factor structure across samples (Dozois, Dobson, & Ahnberg, 1998). Unfortunately several of the items such as sadness and crying shifted factor loadings depending upon the type of sample (clinical vs. nonclinical).

SUMMARY. The BDI-II represents a highly successful revision of an acknowledged standard in the measurement of depressed mood. The revision has improved upon the original by updating the items to reflect contemporary diagnostic criteria for depression and utilizing state-of-the-art psychometric techniques to improve the discriminative properties of the instrument. This degree of improvement is no small feat and the BDI-II deserves to replace the BDI as the single most widely used clinically administered instrument for the assessment of depression.


Meehl, P. E., & Rosen, A. (1955). Antecedent probability and the efficiency of psychometric signs, patterns, or cutting scores. Psychological Bulletin, 52, 194-216.

Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571.

Beck, A. T., & Steer, R. A. (1987). Beck Depression Inventory manual. San Antonio, TX: The Psychological Corporation.

Piotrowski, C., & Keller, J. W. (1989). Psychological testing in outpatient mental health facilities: A national study. Professional Psychology: Research and Practice, 20, 423-425.

Piotrowski, C., & Lubin, B. (1990). Assessment practices of health psychologists; Survey of APA Division 38 clinicians. Professional Psychology: Research and Practice, 21, 99-106.

Santor, D. A., Ramsay, J. O., & Zuroff, D. C. (1994). Nonparametric item analyses of the Beck Depression Inventory: Evaluating gender item bias and response option weights. Psychological Assessment, 6, 255-270.

Dozois, D. J. A., Dobson, K. S., & Ahnberg, J. L. (1998). A psychometric evaluation of the Beck Depression Inventory-II. Psychological Assessment, 10, 83-89.

Review of the Beck Depression Inventory-II by RICHARD F. FARMER, Associate Professor of Psychology, Idaho State University, Pocatello, ID:

The Beck Depression Inventory-II (BDI-II) is the most recent version of a widely used self-report measure of depression severity. Designed for persons 13 years of age and older, the BDI-II represents a significant revision of the original instrument published almost 40 years ago (BDI-I; Beck, Ward, Mendelson, Mock, & Erbaugh, 1961) as well as the subsequent amended version copyrighted in 1978 (BDI-IA; Beck, Rush, Shaw, & Emery, 1979; Beck & Steer, 1987, 1993). Previous editions of the BDI have considerable support for their effectiveness as measures of depression (for reviews, see Beck & Beamesderfer, 1974; Beck, Steer & Garbin, 1988; and Steer, Beck, & Garrison, 1986).

Items found in these earlier versions, many of which were retained in modified form for the BDI-II, were clinically derived and neutral with respect to a particular theory of depression. Like previous versions, the BDI-II contains 21 items, each of which assesses a different symptom or attitude by asking the examinee to consider a group of graded statements that are weighted from 0 to 3 based on intuitively derived levels of severity. If the examinee feels that more than one statement within a group applies, he or she is instructed to circle the highest weighting among the applicable statements. A total score is derived by summing weights corresponding to the statements endorsed over the 21 items. The test authors provide empirically informed cut scores (derived from receiver operating characteristic [ROC] curve methodology) for indexing the severity of depression based on responses from outpatients with a diagnosed episode of major depression (cutoff scores to index the severity of dysphoria for college samples are suggested by Dozois, Dobson, & Ahnberg, 1998).

The BDI-II can usually be completed within 5 to 10 minutes. In addition to providing guidelines for the oral administration of the test, the manual cautions the user against using the BDI-II as a diagnostic instrument and appropriately recommends that interpretations of test scores should only be undertaken by qualified professionals. Although the manual does not report the reading level associated with the test items, previous research on the BDI-IA suggested that items were written at about the sixth-grade level (Berndt, Schwartz, & Kaiser, 1983).

A number of changes appear in the BDI-II, perhaps the most significant of which is the modification of test directions and item content to be more consistent with the major depressive episode concept as defined in the Diagnostic and Statistical Manual of Mental Disorders-Fourth Edition (DSM-IV; American Psychiatric Association, 1994). Whereas the BDI-I and BDI-IA assessed symptoms experienced at the present time and during the past week, respectively, the BDI-II instructs the examinee to respond in terms of how he or she has “been feeling during the past two weeks, including today” (manual, p. 8, emphasis in original) so as to be consistent with the DSM-IV time period for the assessment of major depression. Similarly, new items included in the BDI-II address psychomotor agitation, concentration difficulties, sense of worthlessness, and loss of energy so as to make the BDI-II item set more consistent with DSM-IV criteria. Items that appeared in the BDI-I and BDI-IA that were dropped in the second edition were those that assessed weight loss, body image change, somatic preoccupation, and work difficulty. All but three of the items from the BDI-IA retained for inclusion in the BDI-II were reworded in some way. Items that assess changes in sleep patterns and appetite now address both increases and decreases in these areas.

Two samples were retained to evaluate the psychometric characteristics of the BDI-II: (a) a clinical sample (n = 500; 63% female; 91% White) who sought outpatient therapy at one of four outpatient clinics on the U.S. east coast (two of which were located in urban areas, two in suburban areas), and (b) a convenience sample of Canadian college students (n = 120; 56% women; described as “predominantly White”). The average ages of the clinical and student samples were, respectively, 37.2 (SD = 15.91; range = 13-86) and 19.58 (SD = 1.84).

Reliability of the BDI was evaluated with multiple methods. Internal consistency was assessed using corrected item-total correlations (ranges: .39 to .70 for outpatients; .27 to .74 for students) and coefficient alpha (.92 for outpatients; .93 for students). Test-retest reliability was assessed over a 1-week interval among a small subsample of 26 outpatients from one clinic site (r = .93). There was no significant change in scores noted among this outpatient sample between the two testing occasions, a finding that is different from those often obtained with college students who, when tested repeatedly with earlier versions of the BDI, were often observed to have lower scores on subsequent testing occasions (e.g., Hatzenbuehler, Parpal, & Matthews, 1983).

Following the method of Santor, Ramsay, and Zuroff (1994), the test authors also examined the item-option characteristic curves for each of the 21 BDI-II items as endorsed by the 500 outpatients. As noted in a previous review of the BDI (1993 Revised) by Waller (1998), the use of this method to evaluate item performance represents a new standard in test revision. Consistent with findings for depressed outpatients obtained by Santor et al. (1994) on the BDI-IA, most of the BDI-II items performed well as evidenced by the individual item-option curves. All items were reported to display monotonic relationships with the underlying dimension of depression severity. A minority of items were somewhat problematic, however, when the degree of correspondence between estimated and a priori weights associated with item response options was evaluated. For example, on Item 11 (agitation), the response option weighted a value of 1 was more likely to be endorsed than the option weighted 3 across all levels of depression, including depression in the moderate and severe ranges. In general, though, response option weights of the BDI-II items did a good job of discriminating across estimated levels of depression severity. Unfortunately, the manual does not provide detailed discussion of item-option characteristic curves and their interpretation.

The validity of the BDI-II was evaluated with outpatient subsamples of various sizes. When administered on the same occasion, the correlation between the BDI-II and BDI-IA was quite high (n = 101, r = .93), suggesting that these measures yield similar patterns of scores, even though the BDI-II, on average, produced equated scores that were about 3 points higher. In support of its convergent validity, the BDI-II displayed moderately high correlations with the Beck Hopelessness Scale (n = 158, r = .68) and the Revised Hamilton Psychiatric Rating Scale for Depression (HRSD-R; n = 87, r = .71). The correlation between the BDI-II and the Revised Hamilton Anxiety Rating Scale (n = 87, r = .47) was significantly less than that for the BDI-II and HRSD-R, which was cited as evidence of the BDI-II’s discriminant validity. The BDI-II, however, did share a moderately high correlation with the Beck Anxiety Inventory (n = 297; r = .60), a finding consistent with past research on the strong association between self-reported anxiety and depression (e.g., Kendall & Watson, 1989). Additional research published since the manual’s release (Steer, Ball, Ranieri, & Beck, 1997) also indicates that the BDI-II shares higher correlations with the SCL-90-R Depression subscale (r = .89) than with the SCL-90-R Anxiety subscale (r = .71), although the latter correlation is still substantial. Other data presented in the test manual indicated that of the 500 outpatients, those diagnosed with mood disorders (n = 264) had higher BDI-II scores than those diagnosed with anxiety (n = 88), adjustment (n = 80), or other (n = 68) disorders. The test authors also cite evidence of validity by separate factor analyses performed on the BDI-II item set for outpatients and students. However, findings from these analyses, which were different in some significant respects, are questionable evidence of the measure’s validity as the test was apparently not developed to assess specific dimensions of depression. Factor analytic studies of the BDI have historically produced inconsistent findings (Beck et al., 1988), and preliminary research on the BDI-II suggests some variations in factor structure within both clinical and student samples (Dozois et al., 1998; Steer & Clark, 1997; Steer, Kumar, Ranieri, & Beck, 1998). Furthermore, one of the authors of the BDI-II (Steer & Clark, 1997) has recently advised that the measure not be scored as separate subscales.

SUMMARY. The BDI-II is presented as a user-friendly self-report measure of depression severity. Strengths of the BDI-II include the very strong empirical foundation on which it was built, namely almost 40 years of research that demonstrates the effectiveness of earlier versions. In the development of the BDI-II, innovative methods were employed to determine optimum cut scores (ROC curves) and evaluate item performance and weighting (item-option curves). The present edition demonstrates very good reliability and impressive test item characteristics. Preliminary evidence of the BDI-II’s validity in clinical samples is also encouraging. Despite the many impressive features of this measure, one may wonder why the test developers were not even more thorough in their presentation of the development of the BDI-II and more rigorous in the evaluation of its effectiveness. The test manual is too concise, and often omits important details involving the test development process. The clinical sample used to generate cut scores and evaluate the psychometric properties of the measure seems unrepresentative in many respects (e.g., racial make-up, patient setting, geographic distribution), and other aspects of this sample (e.g., education level, family income) go unmentioned. The student sample is relatively small and, unfortunately, drawn from a single university. Opportunities to address important questions regarding the measure were also missed, such as whether the BDI-II effectively assesses or screens the DSM-IV concept of major depression, and the extent to which it may accomplish this better than earlier versions. This seems to be a particularly important question given that the BDI was originally developed as a measure of the depressive syndrome, not as a screening measure for a nosologic category (Kendall, Hollon, Beck, Hammen, & Ingram, 1987), a distinction that appears to have become somewhat blurred in this most recent edition. Also, not reported in the manual are analyses to examine possible sex biases among the BDI-II item set. Santor et al. (1994) reported that the BDI-IA items were relatively free of sex bias, and given the omission of the most sex-biased item in the BDI-IA (body image change) from the BDI-II, it is possible that this most recent edition may contain even less bias. Similarly absent in the manual is any report on the item-option characteristic curves for nonclinical samples. Santor et al. (1994) reported that for most of the BDI-IA items, response option weights were less discriminating across the range of depression severity among their college sample relative to their clinical sample, an anticipated finding given that students would be less likely to endorse response options hypothesized to be consistent with more severe forms of depression. Also, given that previous editions of the BDI have shown inconsistent associations with social undesirability (e.g., Tanaka-Matsumi & Kameoka, 1986), an opportunity was missed to evaluate the extent to which the BDI-II measures something different than this response set. Despite these relative weaknesses in the development and presentation of the BDI-II, existent evidence suggests that the BDI-II is just as sound if not more so than its earlier versions.


Beck, A. T., Ward, C. H., Mendelson, M., Mock, J., & Erbaugh, J. (1961). An inventory for measuring depression. Archives of General Psychiatry, 4, 561-571.

Beck, A. T., & Beamesderfer, A. (1974). Assessment of depression: The Depression Inventory. In P. Pichot & R. Oliver-Martin (Eds.), Psychological measurements in psychopharmacology: Modern problems in pharmacopsychiatry (vol. 7, pp. 151-169). Basel: Karger.

Beck, A. T., Rush, A. J., Shaw, B. F., & Emery, G. (1979). Cognitive therapy of depression. New York: Guilford.

Berndt, D. J., Schwartz, S., & Kaiser, C. F. (1983). Readability of self-report depression inventories. Journal of Consulting and Clinical Psychology, 51, 627-628.

Hatzenbuehler, L. C., Parpal, M., & Matthews, L. (1983). Classifying college students as depressed or nondepressed using the Beck Depression Inventory: An empirical analysis. Journal of Consulting and Clinical Psychology, 51, 360-366.

Steer, R. A., Beck, A. T., & Garrison, B. (1986). Applications of the Beck Depression Inventory. In N. Sartorius & T. A. Ban (Eds.), Assessment of depression (pp. 123-142). New York: Springer-Verlag.

Tanaka-Matsumi, J., & Kameoka, V. A. (1986). Reliabilities and concurrent validities of popular self-report measures of depression, anxiety, and social desirability. Journal of Consulting and Clinical Psychology, 54, 328-333.

Beck, A. T., & Steer, R. A. (1987). Beck Depression Inventory manual. San Antonio, TX: The Psychological Corporation.

Kendall, P. C., Hollon, S. D., Beck, A. T., Hammen, C. L., & Ingram, R. E. (1987). Issues and recommendations regarding the use of the Beck Depression Inventory. Cognitive Therapy and Research, 11, 289-299.

Beck, A. T., Steer, R. A., & Garbin, M. G. (1988). Psychometric properties of the Beck Depression Inventory: Twenty-five years of evaluation. Clinical Psychology Review, 8, 77-100.

Kendall, P. C., & Watson, D. (Eds.). (1989). Anxiety and depression: Distinctive and overlapping features. San Diego, CA: Academic Press.

Beck, A. T., & Steer, R. A. (1993). Beck Depression Inventory manual. San Antonio, TX: Psychological Corporation.

American Psychiatric Association. (1994). Diagnostic and statistical manual of mental disorders (4th ed.). Washington, DC: Author.

Santor, D. A., Ramsay, J. O., & Zuroff, D. C. (1994). Nonparametric item analyses of the Beck Depression Inventory: Evaluating gender item bias and response option weights. Psychological Assessment, 6, 255-270.

Steer, R. A., Ball, R., Ranieri, W. F., & Beck, A. T. (1997). Further evidence for the construct validity of the Beck Depression Inventory-II with psychiatric outpatients. Psychological Reports, 80, 443-446.

Steer, R. A., & Clark, D. A. (1997). Psychometric characteristics of the Beck Depression Inventory-II with college students. Measurement and Evaluation in Counseling and Development, 30, 128-136.

Dozois, D. J. A., Dobson, K. S., & Ahnberg, J. L. (1998). A psychometric evaluation of the Beck Depression Inventory-II. Psychological Assessment, 10, 83-89.

Steer, R. A., Kumar, G., Ranieri, W. F., & Beck, A. T. (1998). Use of the Beck Depression Inventory-II with adolescent psychiatric outpatients. Journal of Psychopathology and Behavioral Assessment, 20, 127-137.

Waller, N. G. (1998). [Review of the Beck Depression Inventory-1993 Revised]. In J. C. Impara & B. S. Plake (Eds.), The thirteenth mental measurements yearbook (pp. 120-121). Lincoln, NE: The Buros Institute of Mental Measurements.

*** Copyright © 2023. The Board of Regents of the University of Nebraska and the Buros Center for Testing. All rights reserved. Any unauthorized use is strictly prohibited. Buros Center for Testing, Buros Institute, Mental Measurements Yearbook, and Tests in Print are all trademarks of the Board of Regents of the University of Nebraska and may not be used without express written consent.


Result List


Refine Search


Citation for the article: Arbisi, P. A., & Farmer, R. F. (2001). Beck Depression Inventory–Second Edition. 
The Fourteenth Mental Measurements Yearbook.


Are you stuck with another assignment? Use our paper writing service to score better grades and meet your deadlines. We are here to help!

Order a Similar Paper Order a Different Paper