Science in Christian Perspective
Literary Statistics and Pauline Authorship I. Historical Background
MAYNARD C. NIEBOER
1143 Eastwood Drive, Mt. Pleasant, Michigan 48858
From: JASA 23 (September 1971): 96-99.
Literary statistics has risen to some prominence in biblical studies in recent years. This two part series is an attempt to survey and evaluate the general approach of literary statistics, especially as it applies to Pauline authorship. Part I introduces the biblical question in terms of Pauline authorship of the Pastoral Epistles. Then a survey is made of the rise of literary statistics with emphasis upon its application to biblical studies. In order to present some of the ideas inherent in literary statistics, a study is made of the classic work by P. N. Harrison. Harrison acknowledges that every so-called Pauline letter has certain characteristic expressions and the lack of others. Yet for the most part, the letters form a more or less clearly defined series within certain limits. However in terms of comparative word usage, unique words, and certain grammatical features, Harrison concludes that the Pastorals form an exception to the Pauline series, and must have been written by "a Paul ist" at some later date. In order to complete this historical survey, various critiques of Harrison's study are reviewed.
Introduction
Objections to Pauline authorship of the Pastoral Epistles (I, II
Timothy, Titus)
can be summarized under four main areas. Prof. Donald Cuthrie has given a good
review of these areas and the advocates of each.1 My approach here
will be merely
to sketch these objections as a prelude to specific consideration of one area
in this paper. The four areas are as follows:
1) Historical problem-some scholars feel that it is impossible to fit the historical data of Paul's life as given in the Pastorals within the framework of history given in the book of Acts.
2) Ecclesiastical problem basically this objection states that the church organization described is too advanced for Paul's time, and the heresy reflected comes from a much later time (presumably gnostic from 2nd century).
3) Doctrinal problem the objection here is that characteristic Pauline teachings are missing, such as Fatherhood of God, mystic union with Christ and the work of the Holy Spirit. Finally, it is felt that the view of faith is stereotyped and fixed, and doesn't fit the creative mind of Paul.
4) Linguistic problem this objection involves the style of writing and word usage. There are a large number of words in the Pastorals which are unique in the New Testament and a large number of words which occur elsewhere in the New Testament, but not in other, undisputed Pauline letters. Many words also show marked kinship with Apostolic Fathers and late Apologists. It appears that this objection carries the most weight for those opting for non-pauline authorship.
Thus, the cumulative effect of these considerations rules out any possibility
of Pauline authorship for many critics. Many feel all the objections
are overcome
by the explanation (theory) that a later "Paulist" in the early
second century produced these Epistles to meet the needs of his own time.2 On
the other hand, it should be pointed out that there are some scholars
who maintain
Pauline authorship and seek to explain the above objections in terms
of amenuensis,
occasional nature, etc.3
I have already indicated that the linguistic argument (along with the doctrinal
problem) seems to be the most telling. The most influential work concerning the
linguistic problem was published in 1921 by P. N. Harrison, who
marshalled considerable
stylistic and word-usage evidence against Pauline authorship in terms
of tables,
word counts and numerical data. His approach has come to be called
the statistical
method. The statistical method in general, and Harrison's work in
particular has
wide reference and influence in contemporary works. Therefore it is necessary
to evaluate it critically and consider it in any discussion of
Pauline authorship
of the Pastorals. Furthermore, the science of statistics has greatly spread and
developed, and thus much more sophisticated and complicated approaches are now
being applied. This makes it extremely difficult for a
nonstatistician to evaluate
objections to Pauline authorship in terms of statistical evidence.
For these reasons,
this paper will focus on linguistic objections to Pauline authorship,
specifically
as they are formulated in terms of statistical analysis. The approach
of my paper
will be to survey statistical critiques of Pauline authorship, using
P. N. Flarrison
as representative of the relatively basic statistical approach, and
A. Q. Morton
as representative of the more sophisticated, "statistics
proper" approach.
However, the major purpose of this paper is a basic evaluation and critique of
the statistical approach, rather
than a detailed critique of any one specific approach. In order to give this a
proper perspective, we turn first to a brief sketch of the
development of literary
statistics (i.e., the application of statistical analysis to literary
criticism).
Rise of Literary Statistics
Modern literary statistics dates from the 1930's when two men wrote articles on
sentence length distribution.4 In 1944, Yule published what has
become a classic
work: The Statistical Study of Literary Vocabulary.5 In 1956, the
first textbook
for literary statistics was published by G. Herdan: Language as
Choice and Chance.6
Herdan refers to a list of about 150 publications dealing with the subject of
literary statistics up to that time.
More recently, a Ph.D. dissertation was published at the University
of Wisconsin (1966).7 It contains a summary of earlier studies which is
comprehensive in scope,
and it breaks down the survey into the various categories of parameters which
have been developed (six). Wachal has over 20 pages of
"selected" bibliography
and presents a synthesized approach to defining a test for authorship. He has
also indicated that the most sophisticated techniques of statistics have been
used in literary analysis (including analysis of variance and
regression analysis).
A computer program was used to analyze the Federalist Papers as an evaluation
of his approach to authorship. However, this work would be a necessary starting
point in any detailed study of literary statistics at the present
time, both from
the standpoint of literature references and choice of parameters to
describe style
or characterize an author.8
In Biblical studies, the classic work concerning a statistical
approach to defining
authenticity is P. N. Harrison's book. Harrison based his ease
against genuineness
upon language and style. Harrison's approach was called statistical because he
made his analysis in terms of word counts and percentages. He counted the total
vocabulary in the three books, making a distinction as to how many
did not appear
elsewhere in the Pauline writings or in the New Testament. He further
made a comparison
of words and phrases characteristic of Paul, and compared the Hapaxes
with second
century Fathers. Since Harrison's work, several critiques on it have
been published;
but no new work appeared for some time. These critiques will be considered at
a later time in the paper.
In 1948, IV. C. Wake10 applied the sentence length work of
Williams and Yule
to Pauline studies. Wake applied the measurement of sentence length
distribution
to several classic Greek authors. He showed that for writers of
continuous prose,
all the works of one author form a statistically homogenous
distribution of sentence
lengths. In this work, a statistical approach to prose analysis was on a more
sophisticated level in terms of the science of statistics. Applied to Pauline
writings, Wake's test indicated Romans, 1st and 2nd Corinthians and Galatians
were indistinguishable, with a sentence length of just over eleven words, while
the remainder had much longer average sentence length. A. Q. Morton has leaned
heavily upon this work of Wake.
In 1958, Robert Morgenthaler11 published in tabular form, a
statistical analysis
for all New Testament words, i.e., their frequency of occurrence. He
presented
The science of statistics has greatly spread and developed, and thus much more sophisticated and complicated approaches are being applied.
a breakdown on the relative frequency of inflected forms, use of prepositions,
participles, nouns, etc. Morgenthaler also discussed the question of
Pauline authorship
of the Pastoral Epistles.
In 1959, Msgr. deSolages12 published an exhaustive study of the synoptic gospel
problem in terms of a comparison of words in common in different combinations
(permutations) of the three gospels. Whereas earlier statistical
approaches concentrated
on single words or groups of words as sampling basis, deSolages uses pericopes
as the sampling base. He finds evidence for a "9" tradition, as well
as evidence for the MatthewLuke use of Mark as a primary source.
In 1960, B. Van Elderen13 completed his doc
toral dissertation, The Pauline Use of the Participle.
Van Elderen classified and analyzed the 1206 participles occurring in
the Pauline
letters. By means of various numerical statistical tables, the participles were
classified according to frequency of occurrence and syntax. The frequency was
expressed as percent of the total words and as percentage of the total number
of participles, and the uses were classified according to the type of
participle,
its gender and position in the sentence.
In 1964, A. 9. Morton and James MeLeman published a book dealing with
statistical
analysis of the Pauline writings. In this book, the authors reach the
conclusion
that only four Epistles of the traditional thirteen could have been written by
Paul (Romans, I and II Corinthians, Galatians). Flowever, in this book a great
deal of background was presented along with conclusions, but very few data were
presented and almost no description of procedures was given. 14 Thus,
in response
to a great deal of criticism, Morton and MeLeman published a much more complete
treatment of their statistical approach to Pauline writings in 1966.15
Exposition and Analysis of Harrison's Approach
We begin this section with a consideration of P. N. Flarrison's work in 1921.
His linguistic argument may be summarized as follows. The language of
the Pastorals
shows obvious pecularities as compared with the other ten letters.
Harrison concedes
that every Pauline letter has certain characteristic expressions, and the lack
of others. Yet, taken as a whole, the letters form a clearly defined
series with
the variations among them within certain limits. Yet, Harrison feels that the
Pastorals cannot be brought into this series because of greater
linguistic differences.
Therefore, he suggests that the Pastorals were not written by Paul,
but by a "Paulist"
with the other ten Pauline letters before him, sometime between A.D.
94-150. There
are, however, authentic Pauline fragments contained within the matrix
of the present
Pastorals as we know them. Harrison's (statistical) data are as follows: 1) 36%
of the words (848 total vocabulary) occurring in the Pastorals do not occur in
the other ten Pauline letters; 2) 175 hapax legomena; 3) 131 words
occur in the
Pastorals and other New Testament books but not in any other Pauline writing;
4) large number of words that Paul uses in his other letters are
absent from the
Pastorals (582 words peculiar to Paul and 1053 also occur in other
New Testament
books); 5) particles, prepositions and other minor parts of speech
which are clearly
Pauline, are for the most part lacking in the Pastorals; 6) the language of the
Pastorals is said to show a clear relationship with the language of
the Apostolic
Fathers and the Apologists in the second century.
For many, Harrison's work closed the question concerning Pauline authorship of
the Pastoral Epistles. However, from time to time Harrison's work has
been criticized
on different grounds; see, for instance, W. Michaelis,16,17 F. R.
M. Hitchcock,18,19 D. Guthrie,20 B. M. Metzger,22
and K. Grayston and C. Herdan.23 Grayston
and Hcrdan give the best summary of the objections that have been
raised to Harrison's
method. In this paper, I want to review only the objections of
Guthrie, Metzger,
and Grayston and Herdan. Cuthrie basically objects to the application
of mathematics
to literature or language: "Literary art cannot be reduced to a
mathematical
equation . . . and mathematical equations can never prove linguistic
affinity."23
Guthrie denies that frequency relationships such as those used by Harrison can
be used to characterize style of an author.
Crayston and Herdan
challenge Guthrie
as follows.
One of the principal results of structural linguistics, as we know it today, is that a language is characterized by phonemics (smallest distinctive feature), its vocabulary and grammar, but also by the frequency of use attached so particular linguistic forms through their continued use by members of the speech community. It has come to light that there is a farreaching similarity between members of the speaking community, not only in the phonemooic system, vocabulary and grammar, but also in the frequency of use of particular linguistic forms such as lexicon items, grammatical forms and structures as well as phonemos; in other words, a similarity not only in what is used but also how often it is used . . . . The importance of the frequency distribution of language as a linguistic factor has given rise to the construction of what may be called statistical dictionaries. We are fortunate to have a complete work of this type for the NT Greek in Morgenthaler's Statirtik des Neutesta meutliehen Wortschatzes.24
Metzger's criticism is of a different nature than Guthrie. He objects
to harrison's
failure to consider the work of Yule which concerned itself with the legitimacy
and limitations of using word-count to establish authorship. Yule posits that
adequate statistical analysis in prose would require a piece at least
ten thousand
words long. Metzger goes on to point out that The Pastorals are far from that
long. However, Metzger seems to have forgotten that Yule's work was
with the text
of Imitation of Christ, not biblical or even Greek literature. It is
very problematical
whether conditions for one test can be generalized to another
language and genre
of literature.
Grayston and Herdan offer a penetrating critique (of Harrison's work)
which comes
to grips with the crucial problem, namely, what parameters to use to
express style,
and how specifically to analyze these statistically. Their critique
can be summarized
as follows:
1. Harrison failed to distinguish between a word which occurs only once in text (Hapax), and a word which is peculiar to the text in question (one- sample word). Hapax gives frequency within sample test and one-sample words give vocabulary connection between samples.
2. He was not aware of the finding that the ratio of specified portion of prose text (such as he called Hapax) changes with text length (found in 1943 among some literature). The authors insist it is more in keeping with our knowledge of the relation between vocabulary arid frequency of occurrence to relate the number of words peculiar to a text to the total vocabulary, instead of working with the vocabulary occurrence per page as Harrison did. In other words, he didn't take both vocabulary occurrence and text length into account.
3. His work was incomplete in the sense that it considered only the words peculiar to each Pauline
letter and not words common to two, three etc. letters. These additional words will have some hearing upon vocabulary connectivity. Grayston and Herdan suggest the use of the sum of words peculiar to a given letter and words common to all letters relative to the total vocabulary of the letter concerned.
4. Harrison's method lacks a standard of comparison. He recognized this, so he compared the word class frequency in any one letter with the corresponding one in another letter. However, Grayston and Hedran suggest what to expect in the way of vocabulary connectivity on pure chance. They go on to suggest a way of constructing such a standard in terms of "random partitioning of vocabulary."
In looking at Morton's work, we see a "change" in approach.
But perhaps
it would be better to describe this change as an extension or a sophistication,
for it is a quantitative change rather than a qualitative change compared with
Harrison and other works like his (such as Van Elderen and
Murgenthaler). Harrison
works with counts and averages, tabulation of occurrence in terms of
average frequency
etc. However, he doesn't apply statistical procedures which are part of what we
call the "science of statistics". The work by Morton involves use of
concepts such as frequency distribution, standard deviation of the
distribution,
comparison of different distributions for homogeneity, etc. It is necessary to
consider Morton's work because it represents a new and influential approach to
authenticity work, and it also contains some potentially useful tools for study
with an evangelical framework.
Part Two of this paper gives an exposition of A. O.
Morton's statistical approach to Pauline Authorship, and then
presents a critique
of the statistical approach in general.
REFERENCES
lGuthrie, Donald, The Pauline Epistles (New Testament
Intro
duction). London: Intervarsity Press, 1961.
2Ibid., p. 209.
3cf. Guthrie for a good discussion of rebuttals against each
objection, pp. 210ff.
4
G. Udny Yule, "Biometrika" 30, 363 (1939).
C. B. Williams, "Biometrika" 31, 356 (1940).
5Cambridge
University Press,
1944.
6Gronigen (Holland) 1956.
7llobert S. Wachal, Linguistic Evidence, Statistical Inference, and
Disputed Authorship.
8Department of Classics, Dartmouth College, Hanover, N.H. For several
years now,
the Department of Classics of Dartmouth College has been publishing a
newsletter
called CALCULI. It contains bibliographies, project summaries and
computer programs
available in literary statistics all over the world. Extensive work
is now being
carried on in Latin and Greek Classics, medieval literature and modern works.
Work is also in progress to get an extensive library of literature
texts on computer
tape.
9Harrison, P. N., The Problem of the Pastoral Epistles, London:
Oxford, 1921.
10Wake. W. C., Hibbert Journal 47, 50-55 (1948). cf also Journal
of the Royal Society, Series A, Part 3, 1957, Vol. 120, pp. 331-346.
11Morgenthaler, Robert, Stotisk des Nentestamesstischer Wonschatzes
Zurich: Golthelf-Verlag,
1958.
l2DeSolages, B. (Msgr.) A Greek Synopsis of the Gospels,
Leyden Brill, 1959 (English translation from French by J. Raissus).
13Wan Elderen, Bastiaan, The Pauline Use of the Participle, Ph.D dissertation,
Pacific School of Religion, San Francisco, 1960.
14Morton, A. Q., and McLeman, James, Christianity in the
Computer Age (New York: Harper and Row, 1964).
15Morton, A. Q, and MeLemao, James, Paul, the Man and the Myth, (New
York: Harper
and Row, 1966).
In 1961, A. Q. Morton published the first of several co-authored
books, this one
with Prof. G. Fl. C. Macgregor, The Structure of the Fourth Gospel. This hook
contained a brief review of the field of literary statistics, very little about
statistics per se, data tables of sentence and paragraph divisions,
and a hypothesis
which asserted that the gospel of John is a composite of at least two sources.
In 1964, two hooks were published simultaneously: The Structure of
Luke and Acts
by Morton and Macgregor, and Christianity in the Computer Age by
Morton and McLeman.
Macgregor, G. H. C. and Morton, A. Q. The Structure
of the Fourth Gospel, London: Oliver and Boyd, 1961.
The Structure of Luke and Acts, New York: Harper and Row, 1964.
Work has also begun in England to apply Morton's ap
proach to the Old Testament. At present, the Hebrew text is being transcribed
on computer tape. So far, only brief progress reports have been issued.
Morris, I'. M. K. and Edward Jones. "Computers and
the Old Testament", The Expository Times LXXXIX, No. 7, pp.
211-214 (April,
1968).
16W, Michaelis, Die Postorolbriefe nod Gefaogeoshaftbrief (Got
linger, 1930). Summarized by Grayston and Herdan.
17
Michaelis "Pastoralbrief und Wordstatisk" Z.N.T.W. XXVIII (1929) pp.
69-76. Summarized by Graystoo and Ilerdass.
18F.R.M. Hitchcock, "Tests for the Pastorals" J.T.S. XXX 1928-29) p.
279.
19F.R.M. Hitchcock, 'Philo and the Pastorals", Hermonthena,
LVI (1940) pp. 113-35 (Quoted by Grayston and Herdais).
20Prof. Donald Gothrie, The Pastoral Epistle, Grand Rapids: Eerdmaos, 1957, pp.
214ff.
21B.M. Metzger, "A Reconsideration of Certain Arguments against
the Pauline
Authorship of the Pastoral Epistles", Expository Times 69-70,
pp. 91-94 (Dec.
1958).
22K. Grayson and G. Herdan, "The Authorship of the Pastorals
in the Light of Statistical Linguistics", N.T.S. 5-6, pp. 1-15
(Oct. 1959); cf also G. Herdan, Language as Choice end Chance,
Groningen, (Holland)
1959, Type-Token Mathematics, A Textbook of Mathematical Linguistics.
The Hague,
Moutan & Co., 1959.
23 The Pastoral Epistles, 1957, pp. 214ff.
24Grayston and Flerdan, op. cit.
25The Statistical Study of Literary Vocabulary (See Reference 5).