RE: Design detection and minimum description length

From: Glenn Morton (glenn.morton@btinternet.com)
Date: Sat Dec 14 2002 - 15:25:22 EST

Next message: Glenn Morton: "RE: Noah not in the Black Sea"

Previous message: Alexanian, Moorad: "RE: Did Jesus know the genetic code?"
Maybe in reply to: Iain Strachan: "RE: Design detection and minimum description length"
Next in thread: Iain Strachan: "RE: Design detection and minimum description length"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

Iain wrote:
>-----Original Message-----
>From: Iain Strachan [mailto:iain.strachan@eudoramail.com]
>Sent: Monday, December 09, 2002 10:00 PM

>(1) The effectiveness of the Fourier transform analysis arose from
>your observation that you had a sequence with periodicity, and
>your personal knowledge that a periodic sequence can be analysed
>as a Fourier series. Personally I don't see the difference
>conceptually between spotting a Fourier series (i.e. a periodic
>function) and spotting a sequence derived from primes. Both are
>bits of maths that you had to know in order to make the deduction.

Agreed, but, when it comes to DNA, exactly WHAT is the math that Demski has
to know in order to conclude that DNA is designed? There is none. The only
math he cites is that of a low probability, but any sequence of similar
length has precisely the same low probability of occurrence. Thus, what
Dembski does with DNA is to say he knows it is designed. There is no math
to indicate design, only math to indicate low probability but that isn't
design at all.

I would like to note that before Fourier came up with that methodology of
detecting periodicities, he knew of no math which would perform that trick.
Where is the Dembski transform? Where is the design transform? It doesn't
exist. THere isn't one because, with DNA, Dembski only claims design based
upon personal bias not knowledge of how or when the Creator designed the
DNA. There is no math for the positive side of his claim--i.e. a coefficient
of design.

>
>(2) The point about the correlation coefficient is more
>interesting, because it precisely illustrates the point that you
>do need inside knowledge, and that you can't rely on some
>"objective math" formula that allows you to crank the handle and
>churn out meaningful results. First, let me quote from a standard
>text-book on numerical analysis (Numerical Recipes in C), talking
>about the correlation coefficient:
>
>"When a correlation is _known to be significant_ [emphasis mine],
>R is one conventional way of summarizing its strength. In fact,
>the value of R can be translated into a statement about what
>residuals (root mean square deviations) are to be expected if the
>data are fitted to a straight line by the least-squares method
>[ref to equations skipped] ... Unfortunately, R is a rather poor
>statistic for deciding _whether_ [emphasis in the original] an
>observed correlation is significant, and/or whether one observed
>correlation is significantly stronger than another. The reason is
>that R is ignorant of the individual distributions of x and y, so
>there is no universal way to compute its distribution in the case
>of a null hypothesis" [Press, Teukolsky, Vetterling & Flannery:
>"Numerical Recipes in C", Second Edition, Cambridge University
>Press, 1992, p636].
>
>So, what are Press et al saying here? That the correlation
>coefficient R is pretty meaningless unless you know that the data
>are correlated already. This is precisely your objection to
>Dembski (he can't detect design unless you tell him it's
>designed, by the "side information"). Does this mean that the
>Correlation coefficient is a totally useless statistic? Not at
>all. They go on to discuss the general shape of the distributions
>of x and y (concerning the fall-off rate of the tails of the
>distributions), that allow one do derive meaningful results and a
>distribution for R. What it comes down to is that if your data
>when plotted on an X/Y scatter plot looks a bit like a long thin
>ellipsoid, then you've good reason to suspect that they are
>correlated, and from that, you can get meaningful results by
>comparing values of R. So, you have to use your intelligence and
>prior knowledge of what correlated variables look like, in order
>to use the correlation coefficient.
>
>To see just how meaningless the results get if you just put the
>numbers into the formula and crank out the result, consider the
>following experiment that you can easily perform in Microsoft Excel.
>
>Generate 100 pairs of (x,y) points from random numbers in the
>range 0-1 (this can be done with the Excel RAND() function. Add a
>101st (x,y) pair and make it equal to (100,100). Now compute the
>correlation coefficient between the two sequences (using the Excel
>CORREL() function). You will get an answer for R that is close to
>0.999. So your "objective math" is telling you that the sequences
>are highly correlated.

So???? They are highly correlated. To shorten your example, the correlation
between:
1,4,3,5,9

and
1,4,3,5,9,2

is very high and R shows it. I see nothing here to support your contention
that R is not a good measure of correlation at all.

>
>But something tells me that these sequences are not highly
>correlated. What do you think it is? It's my inside knowledge of
>what correlated data ought to look like. That tells me that the
>(100,100) point is a massive outlier, and should be discarded.
>(When R drops to around 0.01).

The main issue here is whether or not design can be detected in biological
systems. Your example above won't occur in biological systems. You are
treating the biological sequence as if it is an expermental measurement. It
is a series of relationally fixed objects. One can't simply throw out 'the
outlier' cause there aren't any. All vales are either a,c,t,or g. Put into
math terms all values are interger (0-3) or (1-4). You don't have 100's in
DNA sequences! You have gone off into an area which is irrelevant to the
question of design in the biological polymers.

ID proponents claim to be able to determine that the inforamtion in DNA is
designed because of the exact order of these sequences is required for
protein function. One doesn't have 'massive outliers' unless one find the
letter X in the sequence.

>
>Is this a silly example that wouldn't occur in real life? I've
>seen a lot worse than that.

Not in a sequence of a,c,t, and g's you haven't. Tell me what the outlier
is there? Pray tell?

In the first Neural Nets application
>I worked on (that ended up as a successfully deployed analysis
>tool), I was using a neural net to predict plasma electron density
>profiles inside a fusion experiment (the JET vacuum vessel). The
>electron densities were of the order of 10^20 per cubic metre.
>However, the data file I received had a few electron densities
>that were of the order of 10^76 per cubic metre. My background
>knowledge of Physics told me that you just don't get electron
>densities of 10^76 per cubic metre in a vacuum vessel (or anywhere
>else for that matter ;-). I therefore concluded that these would
>be down to a processing error in the computer program that gave me
>the file of data, and discarded the offending items. If I'd
>naively shoved it all in to the neural net, it would have ended up
>predicting everything in the region of 10^75 - 10^76, and the
>results would have been completely useless.

This doesn't seem to make a point here.

>
>The moral of the story is that you can't make any statistical
>inference (whether it's correlation, pattern detection, or
>"design") just by blindly plugging your data into some formula,
>and relying on the maths to tell you the answer. You have to use
>your background knowledge if it's not to be "Lies, damned lies and
>statistics".

Biological systems don't have outliers in the sense you are using that term
You are equivocating how you treat an experimental measurment versus a
series. That is getting you into trouble. The exact DNA sequence in many of
these protein genes is a sequence which has been verified by multiple
workers. It can no longer be treated as an outlier to be tossed in the
trash. The question is, given the sequence

a,c,c,t,t,t,g,c,a,a,c,a... is it designed?

One doesn't goe through the DNA and say, 'Gee, that 2nd t doesn't belong
cause I don't like it,' and thus by getting rid of it I turn an obviously
undesigned sequence into one which is designed. DNA is much more like a time
series than an experimental measurement.

>
>That is why I don't believe your objection to Dembski's use of
>"side information" is a valid one. There may be other reasons for
>criticizing Dembski, but this isn't one of them.

We disagree. Dembski has no coefficient of design. How do we tell that God
would design DNA in the fashion we see? Maybe God works in other ways?
Maybe he doesn't design things like we do. Dembski is anthropomorphising
God, making God behave like a human.
>
>Apologies for the long delay in responding to this. Other things
>intervened.

No problem, I was in London this week giving an invited paper on some
geophysical work we did in the North Sea last year and chairing a session of
the conference. I too had other things to do.

glenn

see http://www.glenn.morton.btinternet.co.uk/dmd.htm
for lots of creation/evolution information
anthropology/geology/paleontology/theology\
personal stories of struggle

Next message: Glenn Morton: "RE: Noah not in the Black Sea"
Previous message: Alexanian, Moorad: "RE: Did Jesus know the genetic code?"
Maybe in reply to: Iain Strachan: "RE: Design detection and minimum description length"
Next in thread: Iain Strachan: "RE: Design detection and minimum description length"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Sun Dec 15 2002 - 23:26:29 EST