Science in Christian Perspective

Validity of Existing Controlled
Studies Examining the Psychological Effects of Abortion

JAMES L. ROGERS                                     JAMES F. Phifer
Department of Psychology                             Graduate Schoolof Psychology
Wheaton College                     University of Louisville      Wheaton, Illinois 60187               Louisville, KY 40292

JULIE A. NELSON
Graduate School of Psychology
Wheaton College Graduate School
Wheaton, Illinois 60187 From: PSCF 39 (March 1987): 20-30.

Numerous studies have been concerned with the potential psychological sequelae (potential psychological risks) of abortion but conclusions reached are inconsistent. This paper is based on a comprehensive review of studies addressing the question of post-abortion psychological sequelae. Controlled studies were categorized according to research design and then systematically examined for experimental validity. Poor use of methodology and research design surfaced as an explanation for differing conclusions across the literature. As a further means of examining the integrity of comparisons in the literature made between woman having and not having abortions, the maximum likely statistical power was calculated for each controlled study. As a whole, the literature exhibited grossly substandard power characteristics. An effort to isolate the best study to date was made, and a summary of the conclusions from this study is presented. We conclude that the question of psychological sequelae to abortion is not closed.

Since the United States Supreme court made the decision in 1973 to legalize abortion on demand, the number of abortions performed per year has risen dramatically. In the United States there are more than
one and one-half million abortions performed yearly, and of every 100 women of childbearing age, about five obtain an abortion (Henshaw, Forrest and Blaine, 1984). In addition to concerns over medical safety, numerous questions have been raised about the potential psychological risk (the technical term is psychological sequelae)that may accompany elective abortion. There is a large scientific literature that attempts to determine what , if any, psychological risks are involved in having an abortion.

Discovering the truth about the emotional impact of abortion should be of great interest to all. Unfortunately, representatives of both sides of the abortion debate often exercise a high degree of selectivity in their review of the psychological sequelae literature, publicizing only findings that support their position on the matter. This is unfortunate because when thoughtfully approached, it becomes evident that the question of possible sequelae to abortion exists apart from the ethics of the action. This is true for two reasons. First, doing what is "right" or "wrong" may or may not result in changes in the emotional state. For example, evangelical Christians base the correctness of an action on their interpretation of the Scripture. Relative to a directive or principle found in Scripture, an emotional reaction or the absence of the same in women who have had abortions is of little consequence in providing moral guidance. Second, one key issue in determining the morality of abortion is the question of the "rights of the unborn." A woman's psychological reaction to abortion offers little direction concerning whether or not these rights have been violated.

In this paper we do not wish to address questions surrounding the morality of abortion. Rather, we want to provide a review of the psychological sequelae literature aimed at determining the scientific merit of existing studies. Certainly it would be reprehensible to overstate or understate a scientifically validated finding for a "higher" moral cause. Likewise, it would be reprehensible to pass on as "scientific" the claims of studies that exhibit little experimental validity.

To determine the level of rigor that exists in the psychological sequelae literature, we have undertaken a review of this literature from a methodological and statistical perspective. To locate articles we have relied upon computer searches of Index Medicus, Psychological Abstracts, Science Citation Index and the National Institute of Mental Health data base in addition to examination of the bibliographies of all articles located. This search yielded over 300 studies; seventy-six were either clinical case studies or experimental research. In turn, these studies were organized into four categories according to research design: case studies (17), controlled studies (14), retrospective-uncontrolled studies (20) and prospective-uncontrolled studies (23). Each of these four types of research designs have strengths and weaknesses, some of which will be described below.

Unfortunately, there are many inconsistencies in the conclusions drawn by the authors of the studies we located. For example, Wallerstein, Kurtz and Bar-Din (1972) found adverse reactions in fifty percent of the cases studied, while Osofsky and Osofsky (1972), in a study published the same year, concluded that there were few, if any, adverse psychological reactions. When results are this varied, both pro-life and prochoice camps are able to find "evidence" to support their position. Under such circumstances, the need to consider the methodological and statistical practices underpinning each study becomes self-evident. A review of the foundations on which the literature rests sometimes can differentiate between studies that can be trusted and those that come to unwarranted conclusions. If severe methodological flaws in the current literature do exist, these inadequacies, not the conclusions reached, should be the focus of attention. Thus, it is the experimental validity rather than the conclusions of existing studies that provide the focus of this paper.

Conceptualizing Validity

We elected to adopt a conceptualization of experimental validity proposed by Cook and Campbell (1979) to help systematically determine how seriously the conclusions of a given study should be taken. Experimental validity can be categorized into four different types: statistical conclusion validity, internal validity, construct validity and external validity. Statistical conclusion validity is concerned with the extent to which a study permits valid inference about covariation between the independent variable (the presence or absence of abortion) and the dependent variable (some

James L. Rogers received his Ph.D. in psychology from Northwestern University, Evanston, Illinois. He presently holds appointments as associate professor at Wheaton College, Wheaton, IL and research associate at the Northwestern University Medical School. James F. Phifer is currently pursuing his doctoral degree in clinical psychology at the University of Louisville, Louisville, KY. Julie A. Nelson received her M.A. in psychological studies from the Wheaton College Graduate School, Wheaton, IL and currently is in private practice.

measure of psychological sequelae). Internal validity refers to the extent to which the observed effects of the outcome variable (psychological sequelae) may be attributed to the treatment (abortion) rather than alternative causes (age, marital status, religious background, etc.). Construct validity pertains to the extent that the outcome measures, treatments, samples and settings utilized in the research represent the theoretical constructs of interest. In the present context, high construct validity would imply (among other things) that the measuring device used to assess risk was reliable and accurate. Finally, external validity refers to the validity with which conclusions can be generalized to and across populations of persons, settings and time. Having high external validity would mean that the conclusions about abortion and psychological risk found in a given study could be safely applied to women other than those actually involved in the study.

A review of the foundations on which the literature rests sometimes can differentiate between studies that can be trusted and those that come to unwarranted conclusions.

Statistical Conclusion Validity

We will first discuss statistical conclusion validity as it relates to the post-abortion sequelae literature. There are two types of errors one can make when using a statistical hypothesis test to decide whether an experimental group differs from a control group (i.e., an abortion group differs from a non-abortion control group). Type I error refers to concluding from sample data that there is a difference on the outcome variable (i.e., incidence of psychological trauma) when such is not really the case for the two comparison populations. In effect, you have drawn random samples that look different, but both samples have come from the same population (with regard to the outcome parameter of interest). On the other hand, a Type II error occurs when, on the basis of sample data, it is decided that the samples have come from the same population when really each is from a different population.

Ideally we want to carry out hypothesis tests with a low probability of Type I error (e.g., set alpha, the probability of Type I error, at .05 or lower) as well as a low probability of a Type II error (e.g., we want power, the probability of correctly accepting the alternative hypothesis, to be .95 or higher). indeed, both types of error can simultaneously be held to a low probability of occurrence if there are sufficient resources to collect adequately large comparison samples.

In reality it is Often too expensive, time consuming or otherwise difficult to collect sample sizes that will allow one to sufficiently protect against both types of errors. Also, investigators without adequate statistical background and/or access to statistical consultation may not understand how crucial adequate sample size is, particularly as it relates to the possibility of making a Type II error. In such instances, investigator motivation may be insufficient to overcome barriers that work against securing adequate sample sizes.

When resources or - motivation are insufficient to protect against both a Type I and Type 11 error, the research should not be carried out. But often it is. The very typical course of action is to maintain protection against a Type I error while tolerating a high risk of a Type 11 error. In other words, common practice would have us, in the face of limited resources, defend the null hypothesis at the expense of possibly missing a true alternative hypothesis.

An example from the pharmaceutical industry will clarify the usual practice and why it occurs. Suppose it is considered desirable to take a new drug to market but it is too expensive to test the drug against a control product using a large sample size. Most would argue that it would be better for the pharmaceutical firm to err in the direction of not introducing a new drug (that really is better) than to introduce a new drug (thought to be better but that really is not). The implication would be that alpha be kept small at the cost of decreasing power (i.e., increasing Type 11 error probability). After all, if we falsely conclude that the new drug is better and thus commit a Type I error, society must bear the considerable cost of producing and distributing the new drug only to ultimately discover that it is no better or even worse than the old drug. Protecting from Type I error at the expense of increasing the risk of a Type 11 error may mean that no one gets a new and better drug, but at least we will not replace a time-tested solution with a solution that does not work. As it turns out, Type I errors are usually more costly to society than Type Il errors. Avoiding a Type I error will usually guard the status quo and therefore protect traditional practices and thinking.

It can be argued that under certain circumstances traditional wisdom is on the side of the alternative hypothesis, and to guard it, one must (if resources are limited) increase the risk of a Type I error in order to lower the risk of a Type 11 error. Indeed, it might be argued that this is the case regarding the question of

Table I
Statistical Power for Fourteen Comparative Studies That Examine the Psychological Sequelae of Abortion

                     Sample Size
                                                           Harmonic     Relative                          Date
Researcher      N        na     nb               Mean          Power            Country (Data Collected)

David, et al.        98,612 27,234 71,378             39,426                       .99+                                  Denmark 1974-75
Brewer                   7,660    3,550   4,110                3,809                       .99+                                  England 1975-76
Jansson, et al.     30,329   1,773   28,556              3,338                       .99+                                  Sweden 1952-56
Meyerowitz              11l      93       18                         30                        .12                                    U.S.A. 1963-69
Selare, et al.              42       21       21                         21                        .10                                    Scotland 1960-68
Hamill, et al.           128       81       47                         59                         .17                                    Scotland 1971-72
Greenglass,            126      63       63                          63                         .17                                    Canada 1972-73
Niswander,              68      49       19                          27                          .12                                   U.S.A. 1971-72
Athanasiou,         114       76       38                          51                          .16                                   U.S.A. 1970-72
McCance,             300     192    108                        138                           .27                                   Scotland 1967-68
Drower,                 157      88      69                           77                           .19                                   South Africa 1974-75
Brody,                  152      94      58                            72                           .19                                   Canada 1968-70
Simon,                    78      32      46                            38                           .13                                   U.S.A. 1955-63
Todd,                   102       81     22                            35                           .13                                   Scotland 1968-70

Notes: For the purpose of power estimation, we have assumed the following: 1) that "ideal" experimental arrangements exist throughout the literature; namely, that all existing studies perfectly measure an identical outcome parameter that reflects level of depression and that perfect subject equivalency exists at baseline across the two conditions; 2) that post-event depression will be five percent greater in women experiencing abortion than women who carry to term, i.e., that there will be 25% postabortion depression vs. the 20% postpartum depression rate reported by Hopkins, et al., (1984), and 3) that a one tailed z test with alpha set at. 05 on transformed proportions is used as the test statistic.

Power values were determined as outlined by Cohen (1977). In accordance with Cohen's guidelines for unequal sample sizes, the abortion and control group sample sizes (n. and nb, respectively) were converted to a single harmonic mean which was used to enter the power tables.

psychological risk and abortion. For example, the traditional or "status quo" view, many would maintain, is that women who undergo abortions evidence greater emotional risk than those who do not. According to these individuals, the "usual" expectation historically has been that if contrasted to a non-abortion control group, women electing abortion should evidence greater emotional stress. It would therefore follow that under conditions of limited resources, studies that compare an abortion to a non-abortion control group should raise the risk of a Type I error (i.e., falsely concluding there is a "difference") in order to lower the risk of a Type 11 error (i.e., falsely concluding there is not a "difference") on the grounds that doing so would protect the prevailing, traditional view. To do this, however, would mean using an alpha of greater than .05, which we all know is never done!

We do not claim the foregoing argument, but we do maintain that when a large number of individuals believe strongly that a difference between experimental and control groups exists, as is the case in this country regarding the sequelae to abortion question, a statistical decision procedure with good power characteristics (i.e., a low risk of a Type 11 error) must be utilized out of respect for these individuals. In a word, those who are against abortion and believe it to increase psychological sequelae deserve quality studies with good statistical power characteristics. This is true, if for no other reason than that the popular press will label published studies with low statistical power that claim abortion has no psychological effect as "scientific,"and in so doing give them a prestigious status. However, the popular press will not bother to explain, because they will not understand, that there was a good chance of arriving at that conclusion due to limited statistical power, quite aside from whether the conclusion is really true. As scientists who understand these concepts, we have a moral responsibility to make sure that the public is not misled by the absence of "statistically significant" differences in studies with low statistical power.

We have just completed an examination of the existing studies that compare a post-abortion group with a control. After making certain assumptions, we have calculated the level of statistical power present in each study. Our conclusion (see Table 1) is that 11 of the 14 existing studies exhibit statistical power that is not likely to exceed, but could be less than_ 27. We hold that the majority of currently available comparative studies exhibit grossly substandard power characteristics even under assumptions that, if anything, overestimate power levels.

Internal Validity

Internal validity, as noted earlier, refers to the extent to which effects on outcome variables are due to the independent variable (i.e., abortion) rather than to other competing causes. Existing controlled studies addressing the issue of psychological sequelae invariably utilize a quasi-experimerital design, termed the 11 static-group comparison" design in Campbell and Stanley's (1963) classic text, Experimental and Quasiexperimental Designs for Research. (This design is termed the "nonequivalent control group" design if pre-treatment measures are available.) Because it is not possible to randomly assign women to conditions, the abortion and control groups cannot be equated at baseline by chance. Two threats that are endemic to these designs, mortality and selection, will serve to illustrate the serious problems that can plague a study if such threats are not countered.

Mortality becomes a threat when subjects who exhibit certain characteristics of potential importance to the conclusions of a study drop out of one treatment group but not the other. Differential dropout can lead to discrepancies between treatment groups on critical background variables, thus making comparisons at the end of the study impossible to interpret. This is a particularly serious problem in the sequelae to abortion literature due to certain findings reported by Adler (1976). She reviewed 17 studies dealing, to varying degrees, with psychological sequelae, She found sample attrition ranging from 13 percent (Barnes, Cohen, Stockle and McGuire, 1971) to 86 percent (Evans and Gusdon, 1973). In her own study, Adler followed up non-responders and found them most likely to be young, Catholic, and unmarried. Each of these characteristics has been associated with a greater likelihood of negative sequelae (Adler, 1975; Payne, Kruita, Notman, and Anderson, 1976; Osof sky and Osofsky, 1972) ' Adler concluded that experimental mortality may result in the underestimation of the incidence of adverse responses to abortion.

Selection is a threat when, at the outset of the study, subjects assigned to the experimental condition differ from control subjects on baseline characteristics. In this event, differences or similarities between the experimental and control groups found at the end of the study may be due to the treatment (presence or absence of abortion), one or more baseline differences, or the interaction of the treatment with one or more baseline differences. The threat of selection is usually countered by randomly assigning subjects to conditions, but this, as noted earlier, is impossible for abortion sequelae research. If random assignment cannot be used to equate groups at baseline by chance, then one should at least compare baseline characteristics on selected variables to rule out possible important differences.

Selection was indisputably a potential threat to the internal validity of more than 50 percent of the studies we reviewed because baseline measures simply were not collected. Without carefully establishing the baseline comparability of women who receive an abortion to those who do not on at least such rudimentary characteristics as age, number of children, education, socioeconomic status, social support, marital status and phvsical health, the meaning of differences or similarities in the incidence of sequelae will remain speculative.

Those who are against abortion and believe it to increase psychological sequelae deserve quality studies with good statistical power characteristics.

This short discussion should suffice to make the central point that as long as the static-group comparison and the nonequivalent control group designs, without adjustment to compensate for sources of invalidity, remain the standard designs used in abortion sequelae research, then numerous threats to internal validity will cloud our understanding of the psychological significance of abortion.

Construct Validity

Construct validity refers to the extent to which the outcome measures, treatments, samples and settings utilized in the research represent the theoretical constructs of interest. In the abortion sequelae literature, the main concern relates to the construct validity of the dependent variable (some measure of psychological sequelae). In other words, are sequelae being accurately measured? Standardized assessment measures such as the MMPI, the Center for Epiderniologic Studies Depression Scale (Radloff, 1977) or the Symptom Checklist-90 (Derogatis, 1977) have rarely been implemented in psychological sequelae research. Results have typically been derived from a variety of self-report questionnaires, interview schedules, rating scales and clinical opinions. These are almost always of undetermined psychometric adequacy.

We would like to illustrate some of the difficulties in the way psychological sequelae have been assessed with some examples. Niswander and Patterson (1967), Ewing and Rouse (1973), Kretzschmar and Norris (1967) and Bracken, Hachamovitch and Grossman (1974) devised their own self-report questionnaires to assess the psychological reaction to abortion. However, in virtually all instances no attempt was made to validate these instruments or even assess their reliability (i.e., consistency and preciseness). A variety of relatively simple methods have been devised for determining reliability (test-retest, parallel forms and split half techniques), but none of these were conducted. Clearly, the use of measuring devices with unknown reliability can potentially distort the conclusions one makes about the psychological impact of abortion.

Other studies have implemented structured or unstructured interviews as the assessment measure (Patt, Rappaport and Barglow, 1969; Wallerstein, Kurtz and Bar-Din, 1972; Osofsky and Osofsky, 1972; Ford, Castelnuovo-Tedesco and Long, 1971; Peck and Marcus, 1966). It is common knowledge that psychiatric interviews can be highly unreliable and are subject to the specific orientation, level of expertise, biases and expectations of the interviewer. In virtually all cases reviewed, no attempt was made to assess inter-rater reliability (the degree to which two interviewers come to similar conclusions about the same subject), or to control for interviewer bias and expectancies. For example, Osofsky and Osofsky (1972) attempted to quantify such behaviors as crying and smiling during an unstructured interview. These behaviors could easily be influenced by characteristics of the interviewer, but no attempt was made to control for such factors.

Without carefully establishing the baseline comparability of women who receive an abortion to those who do not ... the meaning of differences or similarities in the incidence of sequelae will remain speculative.

Psychological diagnoses were used as an outcome criterion by some researchers. However, none of these studies utilized psychodiagnostic classification schemes with established psychometric adequacy such as the Research Diagnostic Criteria (Spitzer, Endicott and Robins, 1978) or DSM-11/111 (American Psychiatric Association, 1968, 1980). Further, even these diagnostic instruments must be correctly applied by the practitioner if their inherent reliability is to be realized, but rarely was inter-rater reliability assessed. Without such reliability coefficients, the degree of confidence that one can have in the specific raters used in a given study is unknown.

In general, we found little evidence to suggest that construct validity for the dependent measures used to assess sequelae was at an acceptable level. The list of potential threats to construct validity we found is too great to enumerate in this presentation. However, it includes, in addition to the above problems, such practices as obtaining information concerning the level of emotional adjustment from sources other than the patient (Meyerowitz, Satloff and Romano, 1971; Jacobs, Garcia, Rickels and Preucel, 1974; Pare and Raven, 1970; Lask, 1974); conducting follow-up assessment immediately after the abortion in the recovery room (Braken, Hachamovitch and Grossman, 1974; Osofsky and Osofsky, 1972; Moseley, Follinstad, Harley and Heckel, 1981); interviewing patients at unsystematized follow-up intervals ranging from one to five years (Kretzschmar and Norris, 1967) or several months to seven years (Meyerowitz, Satloff and Romano, 1971); and including patients who not only received an abortion but were also sterilized concomitantly, thus subjecting the subject to two treatments simultaneously and rendering any form of causal interpretation impossible.

External Validity

External validity refers to the ability to generalize findings across populations, settings and time, and is critical if the information is going to be useful apart from its experimental setting. However, the majority of existing studies utilize small, self-selected samples of women who had their abortion at one specific hospital. Such selection bias would likely limit the generalizability of any conclusions reached, even if the conclusions were made under conditions of high internal validity. For example, Niswander and Patterson (1967) asked the attending physician to approve or disapprove the mailing of a questionnaire to each of the patients, thus eliminating those patients of whom it was thought that the recollection of the abortion experience would be too painful. Abrams, DiBiase and Sturgis (1979) sent questionnaires only to those patients whom they felt were likely to respond. In both of these cases, the subject selection procedure could seriously alter the generalizability (external validity) of results.

Generalizability of results would be greatly enhanced if subject selection were stratified across the various settings in which abortions are performed. Indeed, the distribution of such settings can be approximated. In 1982, 82 percent of abortions in America were performed in non-hospital facilities: 56 percent in abortion clinics, 21 percent in other clinics, and 5 percent in physicians' offices (Henshaw, Forrest and Blaine, 1984). Eighteen percent of abortions were performed in hospitals. Unfortunately, no study of which we are aware has attempted to make the research sample utilized in the study representative of have been conducted in the current decade, the research sample utilized in the study representative of these known demographic characteristics. The distribution of settings for the research sample being used is often not even specified

A second obstacle to external validity is highlighted by the widely varying definitions of psychological
sequelae that are used across the various studies in the area. In one respect, the search for abortion related
sequelae of many different kinds enhances generaliz ability. However, to the degree that our confidence in
findings is lessened because results of studies that use different definitions of sequelae are difficult to pool,
generalizability is retarded. This may contribute to the inconsistencies found among results in the literature.
Some studies define negative psychological reactions to abortion in terms of psychological symptornatology
such as depression, anxiety or guilt. Another may attach importance to the number of symptoms, while others
rely on the subjective experience of the woman as she reports it in a self-report questionnaire. The resulting
ambiguities make the literature difficult to summarize as there are no subgroups of studies that consistently
measure the same dependent variable defined in the same way. It therefore goes without saying that the
literature contains few replications of procedures or findings. Given small sample sizes and virtually no
replication across investigators, the potential for non generalizable (not to mention unreliable) conclusions is
substantial.

Clearly, the use of measuring devices with unknown reliability can potentially distort the conclusions one makes about the psychological impact of abortion.

Lastly, generalizability across time is a crucial issue. Approximately half of the studies we reviewed were
conducted from 1967 to 1973 when abortion laws were being liberalized. During this period, therapeutic abor
tions were granted on medical and/or psychiatric grounds. The remaining studies were conducted in the
mid-to-late 1970's under abortion-on-dernand. (Note that some of these studies were not published until the
early 1980's). It is highly questionable as to whether conclusions drawn from studies utilizing women granted abortions on therapeutic grounds only, as was the case until 1973 in the United States, are generalizable to the current social milieu characterized by abortion-on-demand. Furthermore, as no new studies

Generalizability Of results would be greatly enhanced if subject selection were stratified across the various settings in which abortions are performed.

Which Studies are Best?

It now should be clear that considerable ambiguity surrounds the question of post-abortion sequelae because numerous methodological problems exist in the literature. In the midst of the confusion arising from generally poor methodology, it is only natural to ask whether some of the existing studies are more trustworthy than others. Certainly when studies of relatively high and low validities conflict, the conclusions of the higher quality studies should be given the most weight. As Mintz (1983) has stated, "literally no number of anecdotal reports, uncontrolled trials or poorly designed experiments can outweigh one carefully planned and executed controlled experiment if it results in clear and divergent findings" (p. 74). On this same issue, Smith, Glass and Miller (1980) write: "The important question in surveying a body of literature is to determine whether the best designed studies yield evidence different from more poorly designed studies. if the answer is yes, then one is compelled to believe the best ones" (p. 64).

Pursuing this line of thought, we would like to critique what we consider to be the "best" study done in this area to date. Danish researchers David, Rasmussen and Holst (1981) have carried out the only study we located that exhibited the minimum criteria of a control group, pretest measures, adequate sample size, an attempt to equate non-equivalent groups at baseline, and assessment tools with adequate validity and reliability. It is our hope that the ensuing critique of this study, which in our opinion is one of the few acceptable studies (but certainly not without problems), will highlight in a concrete way the issues that the clinician and/or woman considering abortion must keep in mind when examining the research.

Utilizing the computer linkage of the Danish national case registry, the above authors studied the

comparative risk of admission to a psychiatric hospital within three months of an abortion or term delivery for all women under age 50 residing in Denmark. Data on admission to psychiatric hospitals was obtained on 71,378 women carrying pregnancies to term, 27,234 women terminating unwanted pregnancies, and on the total population of 1,169,819 women aged 15 to 49. In determining the incidence rates, only first admissions were recorded; women with an admission during the 15 months prior to the delivery or abortion were excluded.

Figure IA contrasts women who delivered, women who had abortions, and all women in Denmark aged 15 through 49 on incidence of psychiatric hospitalization. Incidence rates were highest for women who were post-abortion (18.4 per 10,000), next highest for women who were postpartum (12.0 per 10,000), and lowest for all women (7.5 per 10,000). In Figure 1B the incidence rates have been further broken down by

age category. Only in women aged 35 through 49 is there a reversal in the direction found in the composite data. Here, women who delivered evidenced a higher rate of psychiatric hospitalization than women who aborted (22.2 per 10,000 vs. 13.4 per 10,000). It appears that the pregnancy event (birth or abortion) interacts with age; women who are post-abortion are at greater risk except in the age category 35 through 49, where the relationship reverses.

Given small sample sizes and virtually no replication across investigators ' the potential for nongeneralizable (not to mention unreliable) conclusions is substantial.

Incidence of psychiatric hospitalization between postpartum and post-abortion women in each of three marital status categories is depicted in

Figure 1C. Differences across conditions are relatively small for women who were currently married or never were married, but are extreme when considering women who were separated, divorced or widowed (16.9 per 10,000 postpartum vs. 63.8 per 10,000 post-abortion). Apparently, women who have suffered from a separation with their husband also have a more difficult time dealing with the termination of the pregnancy. Lack of an emotional support system may be more prevalent for women who are estranged or whose husbands have died.

Finally, Figure ID compares postpartum and postabortion women across four levels of parity, or number of prior children. Regardless of the number of prior children, women who were postpartum evidenced a lower rate of psychiatric hospitalization than women who were post-abortion. However, these differences are more extreme for women with zero or one prior child (13.8 per 10,000 vs. 22.4 per 10,000 with parity of zero; 9.7 per 10,000 vs. 23.3 per 10,000 with parity of one). This may suggest that women who have one or no offspring are a greater post-abortion psychological risk than those with several children.

Our review of the post-abortion sequelae literature suggests that the majority of studies published in this area are greatly flawed.

Although these findings may seem reasonable to those not acquainted with the post-abortion sequelae literature because they rnirror traditional expectations, it is apparent to anyone who has read this literature that these outcomes stand in stark contrast to conclusions reached by the majority of researchers. The majority of researchers conclude that there is no greater occurrence of post-abortion sequelae than postpartum sequelae. A study representative of this literature was done by the English researcher Brewer (1977) and was published in the prestigious British Medical journal. Brewer places the post-abortion rate at only 3 per 10,000 while the postpartum rate was placed at 17 per 10,000. (See Figure le for a comparison to David, Rasmussen and Hoist). indeed, these findings led Brewer to conclude that " . . . childbirth is more hazardous in psychiatric terms than abortion. . . " (p . 477). However, our analysis indicates that the Danish study by David, Rasmussen and Hoist rests upon a much firmer methodological foundation than does the English study by Brewer.

We would like to delineate some of the problems found in the English study authored by Brewer as an illustration of our concern over poor methodology. First, Brewer relied upon a questionnaire that was sent to psychiatrists in a given British catchment area. Thus, his data depended upon each psychiatrist's memory and/or ability (willingness?) to retrieve records. We know of no reliability or validity coefficients for this questionnaire and have no reason to believe that any were computed. Additionally, the questionnaire was sent to only 25% of the psychiatric consultants in the area. There is no guarantee that these consultants are representative, and indeed Sim and Neisser in their analysis "Post-Abortive Psychoses: A Report from Two Centers" (1979) claim that " . . . the psychiatrist with the greatest responsibility and experience in the area of the assessment and treatment of patients with instability associated with pregnancy did not participate." Brewer also reports that some psychiatric consultants had well defined catchment areas while some had catchment areas that overlapped with those of other psychiatrists. In effect, the result of this overlap was that the denominators in the incidence rates were 11 estimated." All these practices stand in sharp contrast to David, Rasmussen and Holst's use of computer-held data for the entire population of Danish females aged 15 through 49. In addition, the Danish study matches the post-abortion and postpartum conditions on prior incidence of psychiatric admission over the prior 15month period, age, marital status, and parity. No attempt appears to have been made in the English study to equate comparison groups on these or any other factors.

Conclusion

To summarize, our review of the post-abortion sequelae literature suggests that the majority of studies
published in this area are greatly flawed. Rather than rely on the presently published conclusions, it seems
prudent to focus attention on the methodological short comings in existing studies in order to provide for more
reliable studies in the future. We readily agree that no

research area is free from inevitable methodological flaws, but not all research is dealing with such grave decisions as whether or not a pregnancy should be terminated. Our point is that when research is dealing with such a crucial issue as possible psychological risks for post-abortion women, we need to be as rigorous as possible in designing and conducting credible research.

At minimum, the findings of David, Rasmussen, and Holst, with its differing conclusions from studies evidencing less methodological rigor, should underscore the importance of readdressing the issue of postabortion psychological sequelae with better experimental design. Findings reported in what we consider to be the most reliable study to date are compatible with the assertion that post-abortion psychological sequelae occur more frequently than postpartum sequelae. Obviously, it is of considerable importance that other well planned studies be conducted in an effort to verify the findings reported by David, Rasmussen and Holst. It is crucial that these studies move beyond psychiatric hospitalization as an endpoint measurement to include other forms of emotional sequelae. At minimum, depression should be measured.

Our review of the literature leads us to conclude that the questions of psychological sequelae to abortion is not closed as many researchers have stated, but remains to be determined. Although such a conclusion fails to satisfy the expectations of either those for or against abortion on demand, it seems to reflect the present state of affairs.

References

Abrams, M., Dibiase, V. and Sturgis, S. (1979). Post-abortion attitudes patterns of birth control. Journal of Family Practice, 9, 593-599.

Adler, N. E. (1975). Emotional responses of women foflowing therapeutic abortion. American Journal of Orthopsychiatry, 45, 446-454.

Adler, N. E. (1976). Sample attrition in studies of psychosocial sequelae of abortion: How great a problem. Journal of Applied Social Psychology, 6, 240-259.

American Psychiatric Association (1968). Diagnostic and Statistical Manual of Mental Disorders. Washington: APA.

American Psychiatric Association (1980). Diagnostic and Statistical Manual of Mental disorders. Washington: APA.

Athanisiou, R., Oppel, W., Michelson, L., Unger, T. and Yager, M. (1973). Psychiatric sequelae to term birth and induced early and late abortion: A longitudinal study. Family Planning Perspectives, 5, 227-231.

Barnes, A. B., Cohen, E., Stoekle, J. D. and McGuire, M. T. (1971). Therapeutic abortion: Medical and social sequelae. Annals of Internal Medicine, 75, 881-886.

Bracken, M. B., Hachamovitch, M. and Grossman, G. (1974). The decision to abort and psychological sequelae. Journal of Nervous and Mental Disease, 158,154-162.

Brody, H., Meikle, S. and Gerritse, R. (1971). Therapeutic abortion: A prospective study. 1. American Journal of obstetrics and Gynecology, 109,347-353.

Brewer, C. (1977). incidence of post-abortion psychosis: A prospective study. British Medical journal, 6059, 476-477.

Campbell, D. T. and Stanley, J. C. (1963). Experimental and QuasiExperimental Designs for Research. Chicago: Rand McNally College Publishing Company.

Cohen, J. (1977). Statistical Power Analysis for the Behavioral Sciences. New York: Academic Press, Inc.

Cook, T. D. and Campbell, D. T. (1979). Quasi-Experimentation: Design and Analysis Issues for Field Settings. Chicago: Rand McNally College Publishing Company,

David, H. P., Rasmussen, N. K, and Holst, E. (1981). Postpartum and postabortion psychotic reactions. Family Planning Perspectives, 13, 8892.

Derogatis, L. R. (1977). The SCL-90 Manual I: Scoring Administration and Procedures for the SCL-90. Baltimore, Md.: John Hopkins School of Medicine, Clinical Psychometrics Unit.

Drower, S. J. and Nash, E. S. (1978). Therapeutic abortion on psychiatric grounds. South African Medical journal, 54, 604-608.

Evans, D., and Gusdon, J. (1973). Post-abOTtion attitudes. North Carolina Medical journal, 34, 271-273.

Ewing, J. A. and Rouse, B. A. (1973). Therapeutic abortion and a prior psychiatric history. American Journal of Psychiatry, 130, 37-40.

Ford, C. V., Castelnuovo-Tedesco, T. P. and Long, K. D. (1971). Abortion: Is it a therapeutic procedure in psychiatry. Journal of the American Medical Association, 218, 1173-1178.

Greenglass, E. R. (1975). Therapeutic abortion and its psychological implicatons: The Canadian experience. Canadian Medical Association journal, 113,754-757.

Hamill, E. and Ingram, 1. M. (1974). Psychiatric and social factors in the abortion decision. British medical journal, 1, 229-232.

Henshaw, S. K., Forrest, J. D. and Blaine, E. (1984). Abortion services in the United States, 1981-1982. Family Planning Perspectives, 16, 119-127.

Hopkins, J., Marcus, M. and Campbell, S. B. (1984). Postpartum depression: A critical review. Psychological Bulletin, 9 5(3), 498-315.

Jacobs, D., Garcia, C. R., Rickels, K. and Preucel, R. W. (1974). A prospective study on the pscyhological effects of therapeutic abortion. Comprehensive Psychiatry, 15, 423-434.

Jansson, B. (1965). Mental disorders after abortion. Acta Psychiatrica Scandinavica, 41, 87-110.

Kretzschmar, R. M. and Norris, A. S. (1967). Psychiatric implications of therapeutic abortion. American Journal of Obstetrics and Gynecology, 98, 368-373.

Lask, B. (1975). Short-term psychiatric sequelae to therapeutic termination of pregnancy. British Journal of Psychiatry, 126, 173-177.

McCance, C., Olley, P. C. and Edward, V. (1973). Long term psychiatric follow-up. in G. Horobin (ed.), Experience t vith Abortion. Cambridge: Cambridge University Press, pp. 245-300,

Meyerowitz, S., Satloff, A. and Romano, J. (1971). Induced abortion for psychiatric indication. American Journal of Psychiatry, 127, 1153-1160.

Mintz, J. (1983). Integrating research evidence: A commentary on metaanalysis. Journal of Consulting and Clinical Psychology, 51, 71-75.

Mosely, D. T., Follingstad, D. R., Harley, H. and Heckel, R. V. (1981). Psychological factors that predict reaction to abortion. journal of Clinical Psychology, 37, 276-279.

Niswander, K. and Patterson, R. (1967). Psychological reaction to therapeutic abortion: 1. Subjective patient response. Obstetrics and Gynecology, 29, 702-706.

Osofsky, D. and Osofsky, J. (1972). The psychological reaction of patients to legalized abortion. American Journal of Orthopsychiatry, 42, 48-60.

Pare, C. M. and Raven, H. (1970). Follow-up of patients referred for termination of pregnancy. Lancet, 1, 653-638.

Patt, S. L., Rappaport, R. G. and Barglow, P. (1969). Follow-up of therapeurt abortion. Archives of General Psychiatry, 20, 408-414.

Payne, E. C., Kravitz, A. R., Notman, M. T. and Anderson, J. V. (1976) Outcome following therapeutic abortion. Archives of General Psychiam 33,725-733.

Peck, A. and Marcus, H. (1966). Psychiatric sequelae of the therapeu-~_ interruption of pregnancy. Journal of Nervous and Mental Disease, 14417-425.

Radloff, L. (1977). The CES-D scale: A self-report depression scale for resea= in the general population. Journal of the Applied Psychological Measurt ment, 1, 385-401.

ScLwv, A. B. and Geraghty, B. P. (1971). Therapeutic abortion: A follow-u: study. Scottish Medical journal, 16, 438-442.

Sim, M. and R. Neisser (1979). Post-abortive psychoses: A report from t. centers. In D. Mall & W. F. Watts (eds.), The Psychological Aspects :Abortion. Washington, D. C.: University Publications of America, Inc., pc 1-13.

Simon, N. M., Rothman, D., Goff, J. T. and Senturia, A. G. (1969). Psycholmcal factors related to spontaneous and therapeutic abortion. AnWTOCI Journal of Obstetrics and Gynecology, 104, 799-808.

Smith, M., Glass, G. and Miller, T. (1980). The Benefits of Psychotherap-t Baltimore, Md.: John Hopkins Press.

Spitzer, R. L., Endicott, J. and Robins, E, (1978). Research diagnostic critenk. Archives of General Psychiatry, 35, 837-844.

Todd, N. A. (1971). Psychiatric experience of the abortion act (1967). BritisJournal of Psychiatry, 119, 489-495.

Wallerstein, S., Kurtz, P. and Bar-Din, M. (1972). Psychological sequelae c, therapeutic abortion in young unmarried women. Archives of Genera Psychiatry, 27, 828-832.

"We are all passengers in a runaway train with neither conductor nor engineer. All we know is that our speed is steadily increasing.
"The tension between the technical apparatus of our existence and the unsolved social, human and spiritual problems, between our mastery of nature and our inadequate solutions of other questions-this tension is growing at a frightening rate.
"We have set loose a vast dynamism. How are we to bring it under control again?"

Julius Baer, a Swiss banker. Quoted in U.S. News and World Report, December 12, 1966; p. 46.

30

PERSPECTIVES ON SCIENCE AND CHRISTIAN FAITH

Discovering the truth about the emotional impact of either clinical case studies or experimental research. In
abortion should be of great interest to all. Unfortunate- turn, these studies were organized into four categories
ly, representatives of both sides of the abortion debate according to research design: case studies (17), con
often exercise a high degree of selectivity in their trolled studies (14), retrospective-uncontrolled studies
review of the psychological sequelae literature, publi- (20) and prospective-uncontrolled studies (23). Each of
cizing only findings that support their position on the these four types of research designs have strengths and
matter. This is unfortunate because when thoughtfully weaknesses, some of which will be described below.
approached, it becomes evident that the question of
possible sequelae to abortion exists apart from the ethics Unfortunately, there are many inconsistencies in the
of the action. This is true for two reasons. First, doing conclusions drawn by the authors of the studies we
what is "right" or "wrong" may or may not result in located. For example, Wallerstein, Kurtz and Bar-Din
changes in the emotional state. For example, evangeli- (1972) found adverse reactions in fifty percent of the
cal Christians base the correctness of an action on their cases studied, while Osofsky and Osofsky (1972), in a
interpretation of the Scripture. Relative to a directive study published the same year, concluded that there
or principle found in Scripture, an emotional reaction were few, if any, adverse psychological reactions.
or the absence of the same in women who have had When results are this varied, both pro-life and pro
abortions is of little consequence in providing moral choice camps are able to find "evidence" to support
guidance. Second, one key issue in determining the their position. Under such circumstances, the need to
morality of abortion is the question of the "rights of the consider the methodological and statistical practices
unborn." A woman's psychological reaction to abortion underpinning each study becomes self-evident. A
offers little direction concerning whether or not these review of the foundations on which the literature rests
rights have been violated. sometimes can differentiate between studies that can
be trusted and those that come to unwarranted conclu
In this paper we do not wish to address questions sions. If severe methodological flaws in the current
surrounding the morality of abortion. Rather, we want literature do exist, these inadequacies, not the conclu
to provide a review of the psychological sequelae sions reached, should be the focus of attention. Thus, it
literature aimed at determining the scientific merit of is the experimental validity rather than the conclusions
existing studies. Certainly it would be reprehensible to of existing studies that provide the focus of this paper.
overstate or understate a scientifically validated find
ing for a "higher" moral cause. Likewise, it would be
reprehensible to pass on as "scientific" the claims of Conceptualizing Validity
studies that exhibit little experimental validity. We elected to adopt a conceptualization of experi
mental validity proposed by Cook and Campbell (1979)
To determine the level of rigor that exists in the to help systematically determine how seriously the
psychological sequelae literature, we have undertaken conclusions of a given study should be taken. Experi
a review of this literature from a methodological and mental validity can be categorized into four different
statistical perspective. To locate articles we have relied types: statistical conclusion validity, internal validity,
upon computer searches of Index Medicus, Psychologi- construct validity and external validity. Statistical con
cal Abstracts, Science Citation Index and the National clusion validity is concerned with the extent to which a
Institute of Mental Health data base in addition to study permits valid inference about covariation
examination of the bibliographies of all articles located. between the independent variable (the presence or
This search yielded over 300 studies; seventy-six were absence of abortion) and the dependent variable (some

J. L. ROGERS, J. F. PHIFER AND J. A. NELSON

measure of psychological sequelae). Internal validity hypothesis, to be .95 or higher). Indeed, both types of
refers to the extent to which the observed effects of the error can simultaneously be held to a low probability of
outcome variable (psychological sequelae) may be occurrence if there are sufficient resources to collect
attributed to the treatment (abortion) rather than alter- adequately large comparison samples.
native causes (age, marital status, religious background,
etc.). Construct validity pertains to the extent that the In reality it is often too expensive, time consuming or
outcome measures, treatments, samples and settings otherwise difficult to collect sample sizes that will allow
utilized in the research represent the theoretical con- one to sufficiently protect against both types of errors.
structs of interest. In the present context, high construct Also, investigators without adequate statistical back
validity would imply (among other things) that the ground and/or access to statistical consultation may not
measuring device used to assess risk was reliable and understand how crucial adequate sample size is, partic
accurate. Finally, external validity refers to the validity ularly as it relates to the possibility of making a Type 11
with which conclusions can be generalized to and across error. In such instances, investigator motivation may be
populations of persons, settings and time. Having high insufficient to overcome barriers that work against
external validity would mean that the conclusions securing adequate sample sizes.
about abortion and psychological risk found in a given
study could be safely applied to women other than When resources or - motivation are insufficient to
those actually involved in the study. protect against both a Type I and Type II error, the
research should not be carried out. But often it is. The
very typical course of action is to maintain protection
against a Type I error while tolerating a high risk of a
Type 11 error. In other words, common practice would
A review of the foundations on which have us, in the face of limited resources, defend the null
the literature rests sometimes can hypothesis at the expense of possibly missing a true
differentiate between studies that can alternative hypothesis.
be trusted and those that come to An example from the pharmaceutical industry will
unwarranted conclusions. clarify the usual practice and why it occurs. Suppose it
is considered desirable to take a new drug to market but
it is too expensive to test the drug against a control
product using a large sample size. Most would argue
that it would be better for the pharmaceutical firm to I
err in the direction of not introducing a new drug (that
Statistical Conclusion Validity
really is better) than to introduce a new drug (thought
We will first discuss statistical conclusion validity as to be better but that really is not). The implication
it relates to the post-abortion sequelae literature. There would be that alpha be kept small at the cost of
are two types of errors one can make when using a decreasing power (i.e., increasing Type 11 error proba
statistical hypothesis test to decide whether an experi- bility). After all, if we falsely conclude that the new
mental group differs from a control group (i.e., an drug is better and thus commit a Type I error, society
abortion group differs from a non-abortion control must bear the considerable cost of producing and
group). Type I error refers to concluding from sample distributing the new drug only to ultimately discover
data that there is a difference on the outcome variable that it is no better or even worse than the old drug.
(i.e., incidence of psychological trauma) when such is Protecting from Type I error at the expense of increas
not really the case for the two comparison populations. ing the risk of a Type II error may mean that no one I
In effect, you have drawn random samples that look gets a new and better drug, but at least we will not
different, but both samples have come from the same replace a time-tested solution with a solution that does
population (with regard to the outcome parameter of not work. As it turns out, Type I errors are usually more
interest). On the other hand, a Type 11 error occurs costly to society than Type II errors. Avoiding a Type I
when, on the basis of sample data, it is decided that the error will usually guard the status quo and therefore
samples have come from the same population when protect traditional practices and thinking.
really each is from a different population.
It can be argued that under certain circumstances
Ideally we want to carry out hypothesis tests with a traditional wisdom is on the side of the alternative
low probability of Type I error (e.g., set alpha, the hypothesis, and to guard it, one must (if resources are
probability of Type I error, at .05 or lower) as well as a limited) increase the risk of a Type I error in order to
low probability of a Type II error (e.g., we want power, lower the risk of a Type 11 error. Indeed, it might be
the probability of correctly accepting the alternative argued that this is the case regarding the question of
22 PERSPECTIVES ON SCIENCE AND CHRISTIAN FAITH

W *Nam

PSYCHOLOGICAL EFFECTS OF ABORTION

Table I

Statistical Power for Fourteen Comparative Studies That Examine the Psychological Sequelae of Abortion

Sample Sizc _ Harmonic Relative Date

Researcher N n~ nb

Wan Power Coun try (Data Collected)

98,612 71,378 39,426 .99+ 1974-75

7,660 4,110 3,809 .99+ 1975-76

30,329 28,556 3,338 .99+ 1952-56

ill 18 30 .12 1963-69

42 21 21 .10 1960-68

128 47 59 .17 1971-72

126 63 63 .17 1972-73

68 19 27 .12 1971-72

114 38 51 .16 1970-72

300 108 138 .27 1967-68

157 69 77 .19 1974-75

152 58 72 .19 1968-70

78 46 38 .13 1955--63

102 22 35 .13 1968-70

Notes: For the purpose of power estimation, we have assumed the following: 1) that "ideal" experimental arrangements exist throughout the literature; namely, that all existing studies perfectly measure an identical outcome parameter that reflects level of depression and that perfect subject equivalency exists at baseline across the two conditions; 2) that post-event depression will be five percent greater in women experiencing abortion than women who carry to term, i.e., that there will be 25% postabortion depression vs. the 20% postpartum depression rate reported by Hopkins, et al., (1984), and 3) that a one tailed z test with alpha set at .05 on transformed proportions is used as the test statistic.

psychological risk and abortion. For example, the tradi- psychological sequelae deserve quality studies with
tional or "status quo" view, many would maintain, is good statistical power characteristics. This is true, if for
that women who undergo abortions evidence greater no other reason than that the popular press will label
emotional risk than those who do not. According to published studies with low statistical power that claim
these individuals, the "usual" expectation historically abortion has no psychological effect as "scientific, "and
has been that if contrasted to a non-abortion control in so doing give them a prestigious status. However, the
group, women electing abortion should evidence popular press will not bother to explain, because they
greater emotional stress. It would therefore follow that will not understand, that there was a good chance of
under conditions of limited resources, studies that arriving at that conclusion due to limited statistical
compare an abortion to a non-abortion control group power, quite aside from whether the conclusion is
should raise the risk of a Type I error (i.e., falsely really true. As scientists who understand these concepts,
concluding there is a "difference") in order to lower the we have a moral responsibility to make sure that the
risk of a Type II error (i.e., falsely concluding there is public is not misled by the absence of "statistically
not a "difference") on the grounds that doing so would significant" differences in studies with low statistical
protect the prevailing, traditional view. To do this, power.
however, would mean using an alpha of greater than
.05, which we all know is never done! We have just completed an examination of the
existing studies that compare a post-abortion group
We do not claim the foregoing argument, but we do with a control. After making certain assumptions, we
maintain that when a large number of individuals have calculated the level of statistical power present in
believe strongly that a difference between experimen- each study. Our conclusion (see Table 1) is that 11 of
tal and control groups exists, as is the case in this the 14 existing studies exhibit statistical power that is
country regarding the sequelae to abortion question, a not likely to exceed, but could be less than, .27. We hold
statistical decision procedure with good power charac- that the majority of currently available comparative
teristics (i.e., a low risk of a Type 11 error) must be studies exhibit grossly substandard power characteris
utilized out of respect for these individuals. In a word, tics even under assumptions that, if anything, overesti
those who are against abortion and believe it to increase mate power levels.
VOLUME 39, NUMBER 1, MARCH 1987 23

L. ROGERS, J. F. PHIFER AND J. A. NELSON

Internal Validity Selection was indisputably a potential threat to the
Internal validity, as noted earlier, refers to the extent internal validity of more than 50 percent of the studies
we reviewed because baseline measures simply were
to which effects on outcome variables are due to the
to not collected, Without carefully establishing the base
independent variable (i.e., abortion) rather than
other competing causes. Existing controlled studies line comparability of women who receive an abortion
addressing the issue of psychological sequelae invari- to those who do not on at least such rudimentary
characteristics as age, number of children, education,
ably utilize a quasi-experimental design, termed the
"static-group comparison" design in Campbell and socioeconomic status, social support, marital status and
Stanley's (1963) classic text, Experimental and Quasi- physical health, the meaning of differences or similari
ties in the incidence of sequelae will remain specula
experimental Designs for Research. (This design is
termed the "nonequivalent control group" design if tive.
pre-treatment measures are available.) Because it is not
possible to randomly assign women to conditions, the
abortion and control groups cannot be equated at
baseline by chance. Two threats that are endemic to Those who are against abortion and
these designs, mortality and selection, will serve to believe it to increase psychol cal
illustrate the serious problems that can plague a study if 09i
such threats are not countered. sequelae deserve quality studies with
good statistical power characteristics.
Mortality becomes a threat when subjects who
exhibit certain characteristics of potential importance
to the conclusions of a study drop out of one treatment
group but not the other. Differential dropout can lead
to discrepancies between treatment groups on critical This short discussion should suffice to make the
background variables, thus making comparisons at the central point that as long as the static-group comparison
end of the study impossible to interpret. This is a and the nonequivalent control group designs, without
particularly serious problem in the sequelae to abortion adjustment to compensate for sources of invalidity,
literature due to certain findings reported by Adler remain the standard designs used in abortion sequelae
(1976). She reviewed 17 studies dealing, to varying research, then numerous threats to internal validity will
degrees, with psychological sequelae. She found sample cloud our understanding of the psychological signifi
attrition ranging from 13 percent (Barnes, Cohen, cance of abortion.
Stockle and McGuire, 1971) to 86 percent (Evans and
Gusdon, 1973). In her own study, Adler followed up
Construct Validity
non-responders and found them most likely to be
young, Catholic, and unmarried. Each of these charac- Construct validity refers to the extent to which the
teristics has been associated with a greater likelihood of outcome measures, treatments, samples and settings
negative sequelae (Adler, 1975; Payne, Kruita, Not- utilized in the research represent the theoretical con
man, and Anderson, 1976; Osofsky and Osofsky, 1972). structs of interest. In the abortion sequelae literature,
Adler concluded that experimental mortality may the main concern relates to the construct validity of the
result in the underestimation of the incidence of dependent variable (some measure of psychological
adverse responses to abortion. sequelae). In other words, are sequelae being accu
rately measured? Standardized assessment measures
Selection is a threat when, at the outset of the study, such as the MMPI, the Center for Epiderniologic
subjects assigned to the experimental condition differ Studies Depression Scale (Radloff, 1977) or the Symp
from control subjects on baseline characteristics. In this tom Checklist-90 (Derogatis, 1977) have rarely been
event, differences or similarities between the experi- implemented in psychological sequelae research.
mental and control groups found at the end of the study Results have typically been derived from a variety of
may be due to the treatment (presence or absence of self-report questionnaires, interview schedules, rating
abortion), one or more baseline differences, or the scales and clinical opinions. These are almost always of
interaction of the treatment with one or more baseline undetermined psychometric adequacy.
differences. The threat of selection is usually countered
by randomly assigning subjects to conditions, but this, We would like to illustrate some of the difficulties in
as noted earlier, is impossible for abortion sequelae the way psychological sequelae have been assessed with
research. If random assignment cannot be used to some examples. Niswander and Patterson (1967),
equate groups at baseline by chance, then one should at Ewing and Rouse (1973), Kretzschmar and Norris
least compare baseline characteristics on selected vari- (1967) and Bracken, Hachamovitch and Grossman
ables to rule out possible important differences. (1974) devised their own self-report questionnaires to
24 PERSPECTIVES ON SCIENCE AND CHRISTIAN FAITH

PSYCHOLOGICAL EFFECTS OF ABORTION

assess the psychological reaction to abortion. However, in general, we found little evidence to suggest that
in virtually all instances no attempt was made to construct validity for the dependent measures used to
validate these instruments or even assess their reliabil- assess sequelae was at an acceptable level. The list of
ity (i.e., consistency and preciseness). A variety of potential threats to construct validity we found is too
relatively simple methods have been devised for deter- great to enumerate in this presentation. However, it
mining reliability (test-retest, parallel forms and split- includes, in addition to the above problems, such
half techniques), but none of these were conducted. practices as obtaining information concerning the level
Clearly, the use of measuring devices with unknown of emotional adjustment from sources other than the
reliability can potentially distort the conclusions one patient (Meyerowitz, Satloff and Romano, 1971;
makes about the psychological impact of abortion. Jacobs, Garcia, Rickels and Preucel, 1974; Pare and
Raven, 1970; Lask, 1974); conducting follow-up assess
Other studies have implemented structured or ment immediately after the abortion in the recovery
unstructured interviews as the assessment measure room (Braken, Hachamovitch and Grossman, 1974;
(Patt, Rappaport and Barglow, 1969; Wallerstein, Osofsky and Osofsky, 1972; Moseley, Follinstad, Harley
Kurtz and Bar-Din, 1972; Osofsky and Osofsky, 1972; and Heckel, 1981); interviewing patients at unsystema
Ford, Castelnuovo-Tedesco and Long, 1971; Peck and tized follow-up intervals ranging from one to five years
Marcus, 1966). It is common knowledge that psychiat- (Kretzschmar and Norris, 1967) or several months to
ric interviews can be highly unreliable and are subject seven years (Meyerowitz, Satloff and Romano, 1971);
to the specific orientation, level of expertise, biases and and including patients who not only received an abor
expectations of the interviewer. In virtually all cases tion but were also sterilized concomitantly, thus sub
reviewed, no attempt was made to assess inter-rater jecting the subject to two treatments simultaneously
reliability (the degree to which two interviewers come and rendering any form of causal interpretation impos
to similar conclusions about the same subject), or to sible.
control for interviewer bias and expectancies. For
example, Osofsky and Osofsky (1972) attempted I to External Validity
quantify such behaviors as crying and smiling during
an unstructured interview. These behaviors could eas- External validity refers to the ability to generalize
ily be influenced by characteristics of the interviewer, findings across populations, settings and time, and is
but no attempt was made to control for such factors. critical if the information is going to be useful apart
from its experimental setting. However, the majority of
existing studies utilize small, self-selected samples of
women who had their abortion at one specific hospital.
Without care Ily establishing the Such selection bias would likely limit the generalizabil
fu ity of any conclusions reached, even if the conclusions
baseline comparability of women who were made under conditions of high internal validity.
receive an abortion to those who do For example, Niswander and Patterson (1967) asked
he attending physician to approve or disapprove the
not the meaning of differences or mailing of a questionnaire to each of the patients, thus
similarities in the incidence of eliminating those patients of whom it was thought that
sequelae will remain speculative. the recollection of the abortion experience would be too
painful. Abrams, DiBiase and Sturgis (1979) sent ques
tionnaires only to those patients whom they felt were
likely to respond. In both of these cases, the subject
selection procedure could seriously alter the generaliz
Psychological diagnoses were used as an outcome ability (external validity) of results.
criterion by some researchers. However, none of these
studies utilized psychodiagnostic classification schemes Generalizability of results would be greatly
with established psychometric adequacy such as the enhanced if subject selection were stratified across the
Research Diagnostic Criteria (Spitzer, Endicott and various settings in which abortions are performed.
Robins, 1978) or DSM-II/III (American Psychiatric Indeed, the distribution of such settings can be approxi
Association, 1968, 1980). Further, even these diagnostic mated. In 1982, 82 percent of abortions in America
instruments must be correctly applied by the practi- were performed in non-hospital facilities: 56 percent in
tioner if their inherent reliability is to be realized, but abortion clinics, 21 percent in other clinics, and 5
rarely was inter-rater reliability assessed. Without such percent in physicians' offices (Henshaw, Forrest and
reliability coefficients, the degree of confidence that Blaine, 1984). Eighteen percent of abortions were
one can have in the specific raters used in a given study performed in hospitals. Unfortunately, no study of
is unknown. which we are aware has attempted to make the
VOLUME 39, NUMBER 1, MARCH 1987 25

J. L. ROGERS, J. F. PHIFER AND J. A. NELSON

research sample utilized in the study representative of have been conducted in the current decade, the
these known demographic characteristics. The distribu- generalizability of conclusions from the more recent
tion of settings for the research sample being used is studies to the present is also open to question.
often not even specified.
A second obstacle to external validity is highlighted
by the widely varying definitions of psychological
sequelae that are used across the various studies in the Generalizability of results would be
area. In one respect, the search for abortion related greatly enhanced if subject selection
sequelae of many different kinds enhances generaliz
ability. However, to the degree that our confidence in were stratified across the various
findings is lessened because results of studies that use settings in which abortions are
different definitions of sequelae are difficult to pool, performed.
generalizability is retarded. This may contribute to the
inconsistencies found among results in the literature.
Some studies define negative psychological reactions to
abortion in terms of psychological symptornatology
such as depression, anxiety or guilt. Another may attach Which Studies are Best?
importance to the number of symptoms, while others
rely on the subjective experience of the woman as she It now should be clear that considerable ambiguity
reports it in a self-report questionnaire. The resulting surrounds the question of post-abortion sequelae
ambiguities make the literature difficult to summarize because numerous methodological problems exist in the
as there are no subgroups of studies that consistently literature. In the midst of the confusion arising from
measure the same dependent variable defined in the generally poor methodology, it is only natural to ask
same way. It therefore goes without saying that the whether some of the existing studies are more trustwor
literature contains few replications of procedures or thy than others. Certainly when studies of relatively
findings. Given small sample sizes and virtually no high and low validities conflict, the conclusions of the
replication across investigators, the potential for non- higher quality studies should be given the most weight.
generalizable (not to mention unreliable) conclusions is As Mintz (1983) has stated, "literally no number of
substantial. anecdotal reports, uncontrolled trials or poorly
designed experiments can outweigh one carefully
planned and executed controlled experiment if it
results in clear and divergent findings" (p. 74). On this
Clearly, the use of measuring devices same issue, Smith, Glass and Miller (1980) write: "The
important question in surveying a body of literature is
with unknown reliability can to determine whether the best designed studies yield
potentially distort the conclusions one evidence different from more poorly designed studies.
makes about the psychological impact If the answer is yes, then one is compelled to believe the
of abortion. best ones" (p. 64).
Pursuing this line of thought, we would like to
critique what we consider to be the "best" study done
in this area to date. Danish researchers David, Ras
Lastly, generalizability across time is a crucial issue. mussen and Holst (1981) have carried out the only
Approximately half of the studies we reviewed were study we located that exhibited the minimum criteria
conducted from 1967 to 1973 when abortion laws were of a control group, pretest measures, adequate sample
being liberalized. During this period, therapeutic abor- size, an attempt to equate non-equivalent groups at
tions were granted on medical and/or psychiatric baseline, and assessment tools with adequate validity
grounds. The remaining studies were conducted in the and reliability. It is our hope that the ensuing critique
mid-to-late 1970's under abortion-on-demand. (Note of this study, which in our opinion is one of the few
that some of these studies were not published until the acceptable studies (but certainly not without prob
early 1980's). It is highly questionable as to whether lems), will highlight in a concrete way the issues that
conclusions drawn from studies utilizing women the clinician and/or woman considering abortion must
granted abortions on therapeutic grounds only, as was keep in mind when examining the research.
the case until 1973 in the United States, are generaliz- _
able to the current social milieu characterized by Utilizing the computer linkage of the Danish
abortion-on-demand. Furthermore, as no new studies national case registry, the above authors studied the
26 PERSPECTIVES ON SCIENCE AND CHRISTIAN FAITH

J. L. ROGERS, J. F. PHIFER AND J. A. NELSON

comparative risk of admission to a psychiatric hospital Finally, Figure 1D compares postpartum and post
within three months of an abortion or term delivery for abortion women across four levels of parity, or number
all women under age 50 residing in Denmark. Data on of prior children. Regardless of the number of prior
admission to psychiatric hospitals was obtained on children, women who were postpartum evidenced a
71,378 women carrying pregnancies to term, 27,234 lower rate of psychiatric hospitalization than women
women terminating unwanted pregnancies, and on the who were post-abortion. However, these differences
total population of 1,169,819 women aged 15 to 49. In are more extreme for women with zero or one prior
determining the incidence rates, only first admissions child (13.8 per 10,000 vs. 22.4 per 10,000 with parity of
were recorded; women with an admission during the 15 zero; 9.7 per 10,000 vs. 23.3 per 10,000 with parity of
months prior to the delivery or abortion were one), This may suggest that women whonave orie or no
excluded. offspring are a greater post-abortion psvchological risk
than those with several children.
Figure IA contrasts women who delivered, women
who had abortions, and all women in Denmark aged 15
through 49 on incidence of psychiatric hospitalization.
Incidence rates were highest for women who were
post-abortion (18.4 per 10,000), next highest for women Our review of the post-abortion
who were postpartum (12.0 per 10,000), and lowest for sequelae literature suggests that the
all women (7.5 per 10,000). In Figure IB the incidence
rates have been further broken down by age category. majority of studies published in this
Only in women aged 35 through 49 is there a reversal in area are greatly flawed.
the direction found in the composite data. Here,
women who delivered evidenced a higher rate of
psychiatric hospitalization than women who aborted
(22.2 per 10,000 vs. 13.4 per 10,000). It appears that the
pregnancy event (birth or abortion) interacts with age; Although these findings may seem reasonable to
women who are post-abortion are at greater risk except those not acquainted with the post-abortion sequelae
in the age category 35 through 49, where the relation- literature because they mirror traditional expectations, a
ship reverses. it is apparent to anyone who has read this literature that I
these outcomes stand in stark contrast to conclusions I
reached by the majority of researchers. The majority of
researchers conclude that there is no greater occurrence
of post-abortion sequelae than postpartum sequelae. A I
Given small sample sizes and virtually study representative of this literature was done by the 0
no replication across investigators, the English researcher Brewer (1977) and was published in I
potential for nongeneralizable (not to the prestigious British Medical journal. Brewer places I
the post-abortion rate at only 3 per 10,000 while the
mention unreliable) conclusions is postpartum rate was placed at 17 per 10,000. (See
substantial. Figure le for a comparison to David, Rasmussen and
Holst). indeed, these findings led Brewer to conclude
that " . . . childbirth is more hazardous in psychiatric
terms than abortion. . . " (p . 477). However, our analy
sis indicates that the Danish study by David, Rasmussen
Incidence of psychiatric hospitalization between and Holst rests upon a much firmer methodological
postpartum and post-abortion women in each of three foundation than does the English study by Brewer. a
marital status categories is depicted in Figure 1C. a
Differences across conditions are relatively small for We would like to delineate some of the problems 0
women who were currently married or never were found in the English study authored by Brewer as an=
married, but are extreme when considering women illustration of our concern over poor methodology.
who were separated, divorced or widowed (16.9 per First, Brewer relied upon a questionnaire that was sent
10,000 postpartum vs. 63.8 per 10,000 post-abortion). to psychiatrists in a given British catchnient area, Thiis,
Apparent iv, women who have suffered from a separa- his data depended upon each psychiatrist's rneniorv
tion with their husband also have a more difficult time and/or ability (willingness?) to retrieve records. %W
dealing with the termination of the pregnancy. Lack of know of no reliability or validity coefficients for this.
an emotional support system may be more prevalent questionnaire and have no reason to beheve triat anv~
for women who are estranged or whose husbands have were computed. Additionally, the questionnaire was:
died. sent to only 25% of the psychiatric consultants h) thc
28 PERSPECTIVES ON SCIENCE AND CHRISTIAN FAITK,

PSYCHOLOGICAL EFFECTS OF ABORTION

area. There is no guarantee that these consultants are research area is free from inevitable methodological
representative, and indeed Sim and Neisser in their flaws, but not all research is dealing with such grave
analysis "Post-Abortive Psychoses: A Report from Two decisions as whether or not a pregnancy should be
Centers" (1979) claim that " . . . the psychiatrist with terminated. Our point is that when research is dealing
the greatest responsibility and experience in the area of with such a crucial issue as possible psychological risks
the assessment and treatment of patients with instabil- for post-abortion women, we need to be as rigorous as
ity associated with pregnancy did not participate." possible in designing and conducting credible
Brewer also reports that some psychiatric consultants research.
had well defined catchment areas while some had
catchment areas that overlapped with those of other At minimum, the findings of David, Rasmussen, and
psychiatrists. In effect, the result of this overlap was Holst, with its differing conclusions from studies evi
that the denominators in the incidence rates were dencing less methodological rigor, should underscore
11 estimated." All these practices stand in sharp contrast the importance of readdressing the issue of post
to David, Rasmussen and Holst's use of computer-held abortion psychological sequelae with better experimen
data for the entire population of Danish females aged tal design. Findings reported in what we consider to be
15 through 49. In addition, the Danish study matches the most reliable study to date are compatible with the
the post-abortion and postpartum conditions on prior assertion that post-abortion psychological sequelae
incidence of psychiatric admission over the prior 15- occur more frequently than postpartum sequelae.
month period, age, marital status, and parity. No Obviously, it is of considerable importance that other
attempt appears to have been made in the English well planned studies be conducted in an effort to verify
study to equate comparison groups on these or any the findings reported by David, Rasmussen and Holst.
other factors. It is crucial that these studies move beyond psychiatric
hospitalization as an endpoint measurement to include
other forms of emotional sequelae. At minimum,
Conclusion depression should be measured.
To summarize, our review of the post-abortion Our review of the literature leads us to conclude that
sequelae literature suggests that the majority of studies the questions of psychological sequelae to abortion is
published in this area are greatly flawed. Rather than not closed as many researchers have stated, but remains
rely on the presently published conclusions, it seems to be determined. Although such a conclusion fails to
prudent to focus attention on the methodological short- satisfy the expectations of either those for or against
comings in existing studies in order to provide for more abortion on demand, it seems to reflect the present state
reliable studies in the future. We readily agree that no of affairs.

REFERENCES

Abrams, M., Dibiase, V, and Sturgis, S. (1979). Post-abortion attitudes patterns Publishing Company.
of birth control. Journal of Family Practice, 9, 593-599. Cohen, J. (1977). Statistical Power Analysis for the Behavioral Sciences. New
Adler, N. E. (1975). Emotional responses of women following therapeutic York: Academic Press, Inc.
abortion. American Journal of Orthopsychiatry, 45, 446-454. Cook, T. D. and Campbell, D. T. (1979). Quasi-Experimentation: Design and
Adler, N. E. (1976). Sample attrition in studies of psychosocial sequelae of Analysis Issues for Field Settings. Chicago: Rand McNally College
abortion: How great a problem. Journal of Applied Social Psychology, 6, Publishing Company.
240-259 David, H. P., Rasmussen, N. K. and Holst, E. (1981). Postpartum and
American Psychiatric Association (1968). Diagnostic and Statistical Manual of postabortion psychotic reactions. Family Planning Perspectives, 13, 88
Mental Disorders. Washington: APA. 92.
American Psychiatric Association (1980). Diagnostic and Statistical Manual of Derogatis, L. R. (1977). The SCL-90 Manual 1: Scoring Administration and
Mental disorders. Washington: APA. Procedures for the SCL-90. Baltimore, Md.: John Hopkins School of
Athanisiou, R., Oppel, W., Michelson, L., Unger, T. and Yager, M. (1973). Medicine, Clinical Psychometrics Unit.
Psychiatric sequelae to term birth and induced early and late abortion: A Drower, S. J. and Nash, E. S. (1978). Therapeutic abortion on psychiatric
longitudinal study. Family Planning Perspectives, 5, 227-231. grounds. South African Medical journal, 54, 604-W8.
Barnes, A. B., Cohen, E., Stoelde, J. D. and McGuire, M. T. (1971). Therapeutic Evans, D., and Gusdon, J. (1973). Post-abortion attitudes. North Carolina
abortion; Medical and social sequelae. Annals of Internal Medicine, 75, Medical journal, 34, 271-273.
881-886. Ewing, J. A. and Rouse, B. A. (1973). Therapeutic abortion and a prior
Bracken, M. B., Hachamovitch, M. and Grossman, G. (1974). The decision to psychiatric history. American Journal of Psychiatry, 130, 37-40.
abort and psychological sequelae. journal of Nervous and Mental Dqsease, Ford, C. V., Castelnuovo-Tedesco, T. P. and Long, K. D. (1971). Abortion: Is it
158,154-162. a therapeutic procedure in psychiatry. Journal of the American Medical
Brody, H., Meikle, S. and Gerritse, R. (1971). Therapeutic abortion: A Association,,218,1173-1178.
prospective study. I. American journal of Obstetrics and Gynecology, Greenglass, E. R. (1975). Therapeutic abortion and its psychological implica
109,347-353. tons: The Canadian experience. Canadian Medical Association journal,
Brewer, C. (1977). incidence of post-abortion psychosis: A prospective study. 113,754-757.
British Medical journal, 6059, 476-477. Hamill, E. and Ingram, 1. M. (1974). Psychiatric and social factors in the
Campbell, D. T. and Stanley, J. C. (1963). Experimental and Quasi- abortion decision. British Medical journal, 1, 229-232.
Experimental Designs for Research. Chicago: Rand McNally College Henshaw, S. K., Forrest, J. D. and Blaine, E. (1984). Abortion services in the
VOLUME 39, NUMBER 1, MARCH 1987 29

L. ROGERS, J. F. PHIFER AND J. A. NELSON

United States, 1981-1982. Family Planning Perspectives, 16, 119-127. Patt, S. L., Rappaport, R. G. and Rarglow, P. (1969). Follow-up of therapeutic
Hopkins, J., Marcus, M. and Campbell, S. B. (1984). Postpartum depression: A abortion. Archives of General Psychiatry, 20, 408-414.
critical review. Psychological Bulletin, 9 5(3), 498-515. Payne, E. C., Kravitz, A. R., Notman, M. T. and Anderson, J. V. (1976).
Jacobs, D., Garcia, C. R., Rickels, K. and Preucel, R. W. (1974). A prospective Outcome following therapeutic abortion. Archives of General Psychiatry,
study on the pscyhological effects of therapeutic abortion. Comprehensive 33,725 -733.
Psychiatry, 15, 423-434. Peck, A. and Marcus~ H. (1966). Psychiatric sequelae of the therapeutic
Jansson, B. (1965). Mental disorders after abortion. Acta Psychiatrica Scandi- interruption of pregnancy. Journal of Nervous and Mental Disease, 143, 1
navica, 41, 87-110. 417-425. 1
Kretzschmar, R. M. and Norris, A. S. (1967). Psychiatric implications of Radloff, L. (1977). The CES-D scale: A self-report depression scale for research
therapeutic abortion. American Journal of Obstetrics and Gynecology, 98, in the general population. Journal of the Applied Psychological Measure
368-373. ment, 1, 385-401.
Lask, B. (1975). Short-term psychiatric sequelae to therapeutic termination of Sclare, A. B. and Geraghty, B. P. (1971). Therapeutic abortion: A follow-up
pregnancy. British Journal of Psychiatry, 126, 173-177. study. Scottish Medical journal, 16, 438-442.
McCance, C., Olley, P. C. and Edward, V. (1973). Long term psychiatric Sim, M. and R. Neisser (19179). Post-abortive psychoses: A report from two
follow-up. In G. Horobin (ed.), Experience t oith Abortion. Cambridge: centers. In D. Mail & W. F. Watts (eds.), The Psychological Aspects of
Cambridge University Press, pp. 245-300. Abortion. Washington, D. C.: University Publications of America, Inc., pp.
Meyerowitz, S., Satloff, A. and Romano, J. (1971). Induced abortion for 1-13.
psychiatric indication. American Journal of Psychiatry, 127, 1153-1160. Simon, N. M., Rothman, D., Goff, J. T. and Senturia, A. G. (1969). Psycholoo
Mintz, J. (1983). Integrating research evidence: A commentary on meta cal factors related to spontaneous and therapeutic abortion, Americafil
analysis. Journal of Consulting and Clinical Psychology, 51, 71-75. Journal of Obstetrics and Gynecology, 104, 799-808.
Mosely, D. T., Follingstad, D, R., Harley, H. and Heckel, R. V. (1981). Smith, M., Class, G. and Miller, T. (1980). The Benefits of Psychotherapyr
Psychological factors that predict reaction to abortion. Journal of Clinical Baltimore, Md.: John Hopkins Press.
Psychology, 37, 276-279. Spitzer, R. L., Endicott, J. and Robins, E. (1978). Research diagnostic criteris,
Niswander, K. and Patterson, R. (1967). Psychological reaction to therapeutic Archives of General Psychiatry, 35, 837-844.
abortion: 1. Subjective patient response. Obstetrics and Gynecology, 29, Todd, N. A. (1971). Psychiatric experience of the abortion act (1967). Britis~
702-706. journal of Psychiatry, 119, 489-495.
Osofsky, D. and Osofsky, J. (1972). The psychological reaction of patients to Wallerstein, S., Kurtz, P. and Bar-Din, M. (1972). Psychological sequelae of
legalized abortion. American Journal of Orthopsychiatry, 42, 48-60. therapeutic abortion in young unmarried women. Archives of General
Pare, C. M. and Raven, H. (1970). Follow-up of patients referred for termina- Psychiatry, 27,828-832.
tion of pregnancy. Lancet, 1, 653-W.

"We are all passengers in a runaway train with neither conductor nor engineer. All we know is that our speed is steadily increasing.
"The tension between the technical apparatus of our existence and the unsolved social, human and spiritual problems, between our mastery of nature and our inadequate solutions of other questions-this tension is growing at a frightening rate.
"We have set loose a vast dynamism. How are we to bring it under control again?"

Julius Baer, a Swiss banker. Quoted in U.S. News and World Report, December 12,1966; p. 46.