RE: Statistics - their value (from a statistician)

Sweitzer, Dennis (SWEITD01@imsint.com)
Wed, 04 Sep 96 08:17:00 EST

Julie K. wrote>>>
...the older I get the less impressed I am with statistics as a form of
validating one's opinions/thoughts/beliefs.

Well, there are good statistics and bad statistics. As a professional
statistician, I'd say the problem is usually in the usage and assumptions,
not the math. Validating one's opinions/... is a usage--namely, finding
out how well you fit in with the herd.

>>But at the same time I've seen stats misused so many, many times in some
critically important - even essential - issues that I tend to relegate them
to the category of merely "interesting" or "curious".

I hope you're not meaning to bash statistics in general. There is indeed
some real garbage circulating--generally by the statistically naive, and the
worst bashing of statistics is by the statistically naive.

>Here's a thot or two for my position these days: you shared the
percentages of those who voted for one position v. those who voted for
another as evidence of the probable position of "the church". 'scuse
me, but I think the vote merely shows the opinions of those who took the
time to vote, not those who belong to the Body itself. And most certainly
not the opinions of the Body as a whole (the church universal).

>... It's something like Ann Landers saying there was an overwhelming
response (to a bizarre situation) and implying that the attitude of the
American public has changed. ....

Statisticians call this "self-selection bias". A extreme example is letters
received by a congressman on a particular issue. Since these are frequently
orchestrated by interest groups, they are useless in evaluating how much of
their constituency feels which way. BUT, letters can illustrate the range
of opinion, which is qualitative information. How many consituents feel
each way would be quantitative information, and can be best determined by
taking a random sample and (nicely) hounding them until they tell you their
opinion.

>>One more example: a recent AP story claimed "the majority of parents in NJ
are in favor of sex education in public schools" citing a survey of 700
people. ..To state that the "majority of parents" believe a certain way
when there are more than 7,890,000 residents in NJ (as of 1995) was a gross
and serious distortion & bad reporting. ... we can also assume a significant
portion do NOT think sex education to kindergartners is a good idea, nor is
a demonstration of oral application of a purple condom appropriate for
elementary students... actual cases, not hyperbole.)

The problem here is not numerical (i.e., 700 out of 7.8million people),
because if you carefully select a random sample, you can be 95% sure of
being within 4% of the actual portion of the population. Non-statisticians
call this figure the margin-of-error. Statisticians officially call it the
sampling error of the poll, or--in private derision--the intended error.
Non-sampling error is all the other factors by which you can be 95% sure of
NOT being within 4% of the actual portion of the population for the actual
question you are trying to get at.

Surveying 700 out of 7.8million is, statistically, not a problem. (A real
statistical problem is how many people they had to contact before getting
700 people to answer the questions. After all, since people that don't
answer polls already differ from people that do answer polls in their
response to pollsters, it stands to reason that they differ in other ways as
well, some of which may be of interest to the pollsters).

I'd say the primary problem here is that if you (randomly) ask people in you
state as to whether they favor sex education in public schools, a majority
of them will favor it. If, on the other hand, you ask them whether they
favor the case you site condom application demonstration, the vast majority
will oppose it.

So there's the rub: Some folks will interpret these poll results (favoring
sex education) as supporting aggressive sex education (which, as it
eventually gets implemented through a long chain of hierarchy, and by
diverse & strange people, leads to examples as you cite). The people polled
(mostly adults like ourselves) had in mind something milder.

The other rub is that the people you & I mingle with will have a different
opinions than the state of New Jersey as a whole. So our daily sample of
opinion will differ from a opinion sample diliberately designed to be
representative (state wide)--hence, the problem may be with our own samples.

Numbers can be misconstrued in interpretation. A good example is the fact
that a majority of Americans oppose abortion, and a majority support choice.
I don't remember the numbers, but it boils down to the fact that a large
number of people in the middle feel abortion is wrong, but do not want the
government making the moral decision. Consequently, these folks in the
middle are claimed by both the pro-life and pro-choice sides.

Getting away from questions of interpretation:

Another kind of non-sampling error is misleading or subtly biased questions.
It is easy to write simple questions which lead to complex answers. It is
quite an art form to ask the right questions, which can lead to (possibly)
usefull solutions.

Even the order in which questions are presented can lead to vastly different
results. For instance, in the 70's if one asked (1) Should defense spending
increase, decrease or stay the same, and (2) Should the US pursue hard line
policies against the USSR. In reverse order, question 2 would have
introduced (or reminded) the pollee of a threat to world peace, altering the
answers to question 1 by a significant amount.

On the other hand, ff one follows good polling practices (i.e.,
representative, random sampling, ask specific non-leading questions, etc.,
etc.), even a biased sponsor can get unbiased results. A curious example
is that of the magazine "Free Inquiry". They seem to embody the "secular
humanism" bogeyman that so many Christians love to hate. Recently, I read
that they, being highly skeptical of the Gallup polls which reported that
Americans were quite religious, commissioned their own poll. The result was
that >90% of Americans did believe in God (the newspaper article did not
report on how Free Inquiry magazine felt about the results).

Even in this example, there were interpretative issues. For instance, how
many of you believe that the Bible is the actual word of God, vs. the
inspired word of God. Of course, they found that the answer varied greatly
among Christians according to education levels. I'd guess that many a
literalist would opt for the inspired answer. (Then again, if they did
indeed define the terms in the poll, I might have to retract this
paragraph).

>So is there a "silent majority"? I am convinced there is. Statistics don't
seem to acknowledge the very great numbers of people not contacted...

That acknowlegement is usually covered in STAT 101.

> There's a huge difference between a survey and a census.

There sure is. Often a survey is more accurate.

Part of the advantage of a survey is that you can devote intense effort to a
sample of sections of the population that is hard to measure, in order to
get a valid measure of those sections. Resources for censuses are usually
spread so thin that the qualilty of the data is lousy. Furthermore,
"censuses" tend to miss large sectors, and are perceived as being more
accurate. (I heard one person lament that his office was hot on a "census"
of 10% of the market, while downplaying a survey based on 60%!!)

Census or survey, you need real statisticians to design, implement, and
interprete both.

Unfortunately, for most people a statistician is some one who cranks
numbers, and a skilled statistician is someone who can skew the numbers in a
pre-determined way.

The concept of a profession statistician, with a code of ethics and a
commitement to revealing truth, is by-and-large unknown in popular--and
sometimes scientific--circles.

>And that's why the "scientific method" of repeated experiments carefully
controlled has value. Statistics of those who vote are not of much value to
me anymore.

I'm not sure what you're getting at here. But I'll charge ahead anyway
;-)

Carefully controlled experiments are only one part of scientific research.
Most scientific research doesn't have that luxury (either too expensive, or
it's physically impossible). Statistics is as part of sound science
(including repeated experiments) as is mathematics. Furthermore, generally
carefully controlled experiments rely heavily on statistics (otherwise, many
would be too expensive or physically impossible).

Ah, but there is one statistic of those who vote that WILL affect you
greatly. Namely, the portions of voters who vote for Bush, Clinton, Perot,
etc. And all concerned parties (political & media) would like the best
possible estimate of THAT statistic before election day.

>There's also a risk of drawing conclusions from data based on one's
interpretation (frame of reference, filter, bias, presuppositions, whatever
you want to call it, as derived from training - other people's ideas we
accept as our own - and/or experience). I see more and more
misinterpretation of data, some of it based on statistics. For example,
the media is upset these days because the public "isn't interested" in
political conventions anymore (evidenced by low ratings) when the real
problem may very well be that viewers are discouraged (some might be
disgusted) with the terrible slant (bias) of the media. The data
(low ratings) have been interpreted incorrectly, something quite common in
our world, I guess. We can probably all cite additional examples from our
chosen fields of study. Just my perspective on the significance of using
statistics to give credibility to our arguments.

You illustrate very well the problem of misinterpretation.

The statistic: Low ratings of the conventions.

Someone's interpretation (allegedly the media): the public is no longer
interested in politics. This could be a strawman interpretation attributed
to the media, or a 15 second summary by an actual media person.

One interpretation: voter's are discouraged with the terrible slant of the
media.

Another interpretation: The viewers are discouraged by the terrible slant
of the conventions. In the good old days, there'd be debates, arguements,
votes, re-votes, maybe a fist fight or two, or even a riot. [Now, THAT'S
entertainment ;-) ] The last conventions (both parties) were little more
than carefully orchestrated infomercials, presenting little else than what
the organizers intended. Even the protestors were carefully coralled miles
away from the convention sites.

It's my experience that people aren't stupid. (Even stupid people can be
suprisingly smart.) Many have have a instinctive distrust of well packaged
information (like commercials), because they have been conned in buying
lousy products with great advertising enough times to know better.

Which leads to other forms of non-sampling error.

People can work against the Pollster. Apparently, Chicago has a reputation
for misleading polls (at least in exit polling for national elections).
People exiting the poll tell the pollster opposite of what they actually
did.

People can work "for" the Pollster. Often people instinctively and
unconsciously "try" to "please" the pollster, by trying "figure" out the
"right" answers. Leading to wrong answers, of course.

People can realize that this poll is a waste of their own time, and get it
over with as quickly as possible. The result is an opinion--the product of
maybe 2 seconds of carefull deliberation. This is smart--since the poll
uses time, and there is no reward for carefull deliberation. When people
vote in November, almost all will have deliberated at least 30 to 300 times
as long, and will use different thought processes to reach a decision.

And I could (and did) go on & on about good vs. bad vs. popular statistics.

There are a number of good books concerning interpretation of statistics for
laypeople.

Some titles that come to mind are:

200% of Nothing

Innumeracy

How to Lie with Statistics (a classic warning on how to recognize it,
not how to practice it!)

Prehaps one of the best for you is:

News & Numbers: A Guide to Reporting Statistical Claims and
Controversies in Health and Other Fields, by Victor Cohn.

Victor Cohn is an award winning science journalist for the Washington Post,
and has been working with the American Statistical Associations (another
ASA, neither of which should be confused with the American Sunbathing
Association).

Grace & peace,

Dennis Sweitzer