Rorschach – Reliability and Validity

The Rorschach – Reliability and Validity

Thomas F. Collura

The Rorschach is a psychometric tool that uses a series of inkblots shown to a subject, and elicits verbal responses as to what the individual sees in the images.  It is administered by an examiner who asks questions and records answers.  Based upon the subject responses, a set of variables can be determined, which are used to define their personality along a set of various axes.  It is intended to elicit accurate information of diagnostic value in clinical work.  There is an enormous literature on the Rorschach, yet it remains, after nearly 100 years, a controversial instrument.  This report discusses the reliability and validity of the Rorschach, based upon a review of relevant literature.

Based upon published reports, the Rorschach can be regarded as a reliable and valid psychometric instrument, given that certain conditions are met.  One is that it is administered by an experienced, competent, and trained examiner.  Another condition is that a known and structured method of assessment be used.  Of the structured methods, there are more than one option.  The most widely recognized system is the Comprehensive System (CS) described by Exner (1993).  A final condition is that it be applied with a suitable population and with an appropriate purpose of diagnosis or assessment, for which validity has been demonstrated.

Reliability depends on the ability to achieve a given measurement consistently (Weiner & Greene, 2008).  Viglione and Taylor (2003) specifically examined this issue using the Comprehensive System.  They reported that in their own study, among 84 raters evaluating 70 Rorschach variables, there was a strong inter-rater reliability, particularly for the base-rate variables.  They also reviewed 24 previously published papers, all reporting various inter-rater reliabilities.  Most of these studies reported reliabilities in the range of 85% to 99%. Aside from inter-rater reliability, test-retest reliability is another important consideration.  Exner (as cited in Groth-Marnat, 2009, pp. 389-90) reported reliabilities from .26 to .92 over a 1-year interval considering 41 variables; four of them were above .90, 25 between .81 and .89, and 10 below .75.  However, the most unreliable variables were attributed to state changes.  It was further noted that the most relied upon factors, ratios and percentages, were among the most reliable.  Therefore, it can be concluded that the Comprehensive System can yield high reliability when used under the conditions applied in these studies. 

Validity depends on the ability of a test to measure the constructs that it is purported to measure (Wiener & Greene, 2008).  Validity in this case can be evaluated by comparing the Rorschach with clinical data or with other established tests of personality.  Weiner (2001), for example, stated that the Rorschach has a validity effect size “almost identical” to the MMPI (Weiner, 2001, p. 423).  Groth-Marnat (2009, p. 391) has pointed out that results of validity studies on the Rorschach have been mixed, but are confounded by various factors including the “type of scoring system, experience of the scorer, and type of population.”  Early studies produced validity scores of .40 to .50, but later studies found scores as low as 0.29.  However, such studies were further confounded by variables such as age, number of responses, verbal aptitude, education, and other confounding factors that were not controlled.

More recent studies of validity have met with mixed results.  Smith et al. (2010) evaluated the validity of the Rorschach in assessing the effects of trauma using a different system, the “Logical Rorschach” developed by Wagner (2001, as cited in Smith et al., 2010).  They found “equivocal” findings, but indicated that the LR “may have some validity in the assessment of trauma-related phenomena.”   Wood et al. (2010) evaluated the Rorschach using a meta-analysis of 22 studies including 780 forensic subjects, in an attempt to separate psychopaths from nonpsychopaths.  They reported a mean validity coefficient of 0.062 using all variables, and a validity of 0.232 using the Aggressive Potential index.  They concluded that their findings “contradict the view that the Rorschach is a clinically sensitive instrument for discriminating psychopaths from nonpsychopaths.” (Wood et al., 2010, p. 336).  Another result was reported by Lindgren, Carlsson, and Lundback (2007) in which they found no agreement between the Rorschach and a self-assessed personality using the MMPI-2.

This leaves the question then, that if the Rorschach is relatively reliable, what is it measuring if it is not the same dimensions as, for example, the MMPI, or forensic psychopathology?  Hilsenroth, Eudell-Simmons, DeFife, and Charnas (2007) did find, for example, that the Rorschach was effective in differentiating psychotic disorder patients from non-patients, as well as from personality disorder patients.  They concluded that the test had clinical meaningfulness for diagnosis and assessment in this population.  In another study, Liebman, Porcerelli, and Abell (2005) reported a validity coefficient of 0.71 in 150 adolescents when comparing the Rorschach aggression variables with the Violence Rating Scale – Revised.  Porcelli and Mihura (2010) evaluated the Rorschach Alexithymia Scale (RAS) as a specific index to identify alexithymia in a psychiatric population.  They studied 219 patients and reported a hit rate of 92%, sensitivity of 88%, and a specificity of 94%.  These findings taken together confirm the validity of the Rorschach, but also highlight the importance of identifying the scoring system and population when evaluating the validity of the Rorschach. 

The Rorschach has certainly had its detractors.  Grove, Barden, Garb, and Lilenfeld (2002) presented a particularly negative summary view of the Rorschach, and concluded that it should not generally be admitted in court testimony.  They based this conclusion largely on a meta-analysis of a large number of studies, and made use of such observations as that it is “engulfed in intense scientific controversy,” that there have been “heated exchanges between advocates and critics,” and “a majority (indeed, in all likelihood, a substantial majority) of the relevant scientific community does not view the RCS as a reliable system.”  They cite weaknesses such as inadequate norms, overestimation of psychopathology and maladjustment, and unacceptable reliabilities in the .45 to .56 range.  However, these global indictments do not necessarily apply to a particular practitioner or group using the Rorschach in a consistent manner with a particular population, with adequate control of variables.  This analysis would also have included earlier studies which, as pointed out by Rose, Kaser-Boyd and Maloney (2001), used nonstandard administration and different scoring systems.  Therefore, these conclusions apply more to inadequacies in the existing base of published literature and the possibility of widespread inconsistency in Rorschach research and application, not to the lack of potential reliability and validity of the test when it is properly applied.

Therefore, the conclusion can be made that the Rorschach is reliable when evaluated using a defined rating scale and an appropriate set of examiners.  Inter-rater reliability and test-retest reliability can be acceptable under these conditions.  Validity can also be demonstrated, but it depends on further factors that relate predominantly to the population and intended use.  In certain cases such as forensic psychopathology or as an alternative to the MMPI, it has been demonstrated to have questionable validity in studies.  In other cases, such as trauma, it has been shown to have some demonstrated validity.  In still others, such as psychiatric issues of psychosis and perception or violent adolescents, it has greater demonstrated validity.  My summary conclusion is that the Rorschach, when properly used, can be reliable.  Its validity depends on the specific population and intended use, and this can vary from relatively poor to quite good.  Weiner (2001, p. 423) therefore makes a fair and applicable statement when he concludes that the Rorschach “works very well for its intended purposes.”


Exner, J.E. (1993) The Rorschach: A comprehensive system: Vol. 1. Basic foundations (3rd ed.). New York Wiley.

Groth-Marnat, G. (2009). Handbook of psychological assessment (5th ed.). Hoboken, NJ: John Wiley & Sons, Inc.

Hilsenroth, M. J., Eudell-Simmons, E. M., DeFife, J. A., & Charnas, J. W. (2007). The Rorschach Perceptual-Thinking Index (PTI): An Examination of Reliability, Validity, and Diagnostic Efficiency. International Journal Of Testing, 7(3), 269-291. doi:10.1080/15305050701438033

Rose, T., Kaser-Boyd. N., & Maloney, M. P. (2001). Essentials of Rorschach assessment. New York: John Wiley& Sons, Inc.

Liebman, S. J., Porcerelli, J., & Abell, S. C. (2005). Reliability and Validity of Rorschach Aggression Variables With a Sample of Adjudicated Adolescents. Journal Of Personality Assessment, 85(1), 33-39. doi:10.1207/s15327752jpa8501_03

Lindgren, T., Carlsson, A., & Lundbäck, E. (2007). No agreement between the Rorschach and self-assessed personality traits derived from the Comprehensive System. Scandinavian Journal Of Psychology, 48(5), 399-408. doi:10.1111/j.1467-9450.2007.00590.x

Musewicz, J., Marczyk, G., Knauss, L., & York, D. (2009). Current assessment practice, personality measurement, and rorschach usage by psychologists. Journal Of Personality Assessment, 91(5), 453-461. doi:10.1080/00223890903087976

Porcelli, P., & Mihura, J. L. (2010). Assessment of alexithymia with the Rorschach comprehensive system: the Rorschach Alexithymia Scale (RAS). Journal Of Personality Assessment, 92(2), 128-136. doi:10.1080/00223890903508146

Viglione, D. J., & Taylor, N. (2003). Empirical support for interrater reliability of rorschach comprehensive system coding. Journal Of Clinical Psychology, 59(1), 112-121.

Weiner, I. R. (2001). Advancing the science of psychological assessment: The Rorschach Inkblot Method as exemplar. Psychological Assessment, 13(4), 423–432.

Weiner, I. B., & Greene, R. L. (2008). Psychometric foundations of assessment. In Handbook of personality assessment (1st ed., pp. 49–75). Hoboken, NJ: John Wiley & Sons,Inc. Copyright 2008 by John Wiley & Sons, Inc. Reproduced with permission of John Wiley & Sons, Inc. in the format electronic usage via the Copyright Clearance Center.

Wood, J. M., Nezworski, M., Allen, K., Lilienfeld, S. O., Garb, H. N., & Wildermuth, J. L. (2010). Validity of Rorschach Inkblot Scores for Discriminating Psychopaths From Nonpsychopaths in Forensic Populations: A Meta-Analysis. Psychological Assessment, 22(2), 336-349. doi:10.1037/a0018998

Smith, S. R., Chang, J., Kochinski, S., Patz, S., & Nowinski, L. A. (2010). Initial validity of the logical rorschach in the assessment of trauma. Journal Of Personality Assessment, 92(3), 222-231. doi:10.1080/00223891003670174