Re: Probability (Was Re: Ken Ham (help))

Brian D. Harper (bharper@magnus.acs.ohio-state.edu)
Thu, 22 Feb 1996 23:26:23 -0500

Eddie Olmstead wrote:

>Abstract: My comments on Glenn's defense of his probability post.
>

Eddie's post really confused me at first since I hadn't seen the
post by Glenn to which he was responding. I finally figured out
that this is a thread on the ASA reflector (which I don't subscribe
to). Glenn's post seems pretty much the same as what he presented
here some time ago. In view of this and also the fact that I don't
want to try to juggle two reflectors at once :), I'm going to respond
only on the evolution reflector. If anyone else did not see Glenn's
post, you can find it in the ASA archive which is accessible from
the evolution archive. I've also sent a copy of this to Eddie by
e-mail in case he is no longer reading the evolution reflector.

First some general comments. In addition to problems Eddie has
already mentioned [chief among them that the prebiotic soup is
a myth ;-)], one also has the following fundamental problem with
a proteins first scenario: even if a functional protein forms
purely by chance in the soup, what's it going to do? If it can't
replicate, that's the end of the story. In order to get some type
of metabolism going one would need several different proteins all
forming by chance and then all getting together somehow in a protected
environment (Fox's "protocell" for example). There aren't
many adherants of the protein first scenario these days.

A much better scenario is RNA first. Here one might consider the
chance formation of a relatively short self-replicating RNA molecule.
Once this forms, natural selection can do its magic ;-).

There is a subtle point here that not many seem to be aware of though.
Just getting a self-replicating RNA is not enough. It has to
replicate itself accurately but not too accurately. If the replication
is perfect, it can't evolve. If it is too "sloppy" it will suffer
the so-called "error catastrophe" and self-destruct. I forget the
details now, the main point though is that their is a very fine
"accuracy" balance that has to be satisfied. Also, there is some
minimum length required to avoid the error catastrophe. Again, I
can't remember the details. Anyway, my point here is that getting
a self-replicator is not the end of the story. It has to have
certain finely tuned properties and also be of a certain length.

Now let's go to the probability calculation.

First let me say that I agree with Glenn in spirit. The probability
argument against abiogenesis, as usually presented, is bad bad bad :),
and the sooner it disappears the better. The reason its bad is not
because a functional protein has a chance of forming in the hypothetical
soup, but rather that the "chance scenario" for the origin of life
met its demise some thirty odd years ago. To talk about the improbability
of life forming purely by chance is to completely ignore all the
modern scenarios for the origin of life. Nobody working in the field
today believes in the "chance scenario", so what's the point in
refuting it?

Anyway, in previous posts I have presented results from algorithmic
information theory that provide a very precise way of working out
probabilities for the "chance scenario". In particular, one is able
to avoid the error of dealing only with a specific result. I would
like to deal directly with the protein case but I haven't the
background to even attempt it, so instead I'll go with the language
analogy.

Since I've presented the details previously, I'm going to skip
them here. One point needs to be made though. Previously I dealt
only with a binary (0,1) alphebet. It turns out that all those
results can be generalized to an alphabet containing an arbitrary
number of "letters".

The key point has to do with compressibility. How many of the
total number of sequences of length n are compressible by a
certain amount?

Shannon has shown that there are purely statistical features of
the English language that allow it to be compressed typically
by about 50% or more. Note that these are purely statistical
features and do not reflect meaning (functionality) at all.

So, what is the probability that one would select by chance
a sequence of length n that is 50% compressible? Again I'll
note that this is very generous in that it considers all
sequences that are compressible by 50% or more. This would
include all sequences satisfying the statistical requirements
of English with no consideration given to meaning, i.e. the
calculation includes many nonsense sequences. There is also
another level of generosity here. The calculation considers
not just sequences having the nose picking functionality, but
any functionality. For the protein analogy this would be like
computing the probability for *any* functional protein. Of course,
one can't stretch the analogy this far unless one knew for certain
that all functional proteins are at least X compressible. Anyway,
here are the results, minus the math details ;-)

sequence length probability
--------------- -----------

20 3 x 10^-16
30 2 x 10^-23
40 2 x 10^-30
50 2 x 10^-37
60 1 x 10^-44
70 1 x 10^-51
80 1 x 10^-58
90 8 x 10^-66
100 7 x 10^-73

Roughly, an increase in length of 10 results in a change in
probability by 7 orders of magnitude.

Again, let me emphasize that these are the probabilities for
generating *any* sequence satisfying only the *statistical*
requirements of English without regard to meaning. The
probability of generating a meaningful sequence will be much
smaller.

Glenn also gives a clever scenario for increasing the probability
of getting some functional sequence of length n by embedding it
in a longer sequence of length m > n. I remember thinking about
this for some time after Glenn first raised this possibility here.
I think that Glenn has made a counting mistake somewhere, however
I'm not going to try to pursue that since I've thought of what
I consider to be a good intuitive argument against this approach.

Let's suppose our "functional" sequence is:

(a) methinksitislikeaweasel

in honor of Richard Dawkins :-). This is probably too short to
survive the error catastrophe, but this is just for purposes
of illustration. Now, this is embedded in:

(b) xxxxxxxxxxmethinksitislikeaweaselyyyyyyyyyyyyyyy

the x's and y's can be anything, of course. The idea now is to
generate this longer sequence purely by chance and then chop
out methinksitislikeaweasel purely by chance. Oh, I just thought
of another problem. One is also going to have to generate lots
of "choppers" purely by chance. One will then have to deal with
the probability of a chance encounter between a "chopper" and
something like (b). It seems to me that this would require not
just a soup but a pretty concentrated soup with lots of "choppers".

Anyway, these sequences are generated one character at a time. Before
getting to (b) one must first have either

(c) xxxxxxxxxx or yyyyyyyyyyyyyyy

This is no great problem, of course, since each of the x's and
each of the y's can be anything. Now, after reaching (c) one
has to add (a) to either end. Now the sequence order does matter,
and the probability that (a) will be added to either end will be
the same as the probability that (a) forms by itself without a
(c) attached to it.

Thus, the probability of interest is that of forming the shortest
sequence that performs the desired function.

Now, after all this, let me say again that I agree with Glenn in
spirit on this business. The reason I get worked up over this is
that I don't want to see abiogenesis research return to the dark
ages of chance. My personal feeling is that self-organization
combined with deep sea hydrothermal vents will be the key to
unlocking the secrets of the origin of life, if indeed this mystery
has a solution accessible to science. I would like to see more
work in this area. One of the keys to this is spreading the news
that the soup is dead :-).

Actually, I suppose that I should have just skipped all the
above and used the words of Leo Buss to answer Glenn:

"Because there's no primordial soup; we all know that, right?"

-- Leo Buss, in the discussion following his paper
(with Walter Fontana), "What Would be Conserved
if "The Tape Were Played Twice?" in _Complexity:
Metaphors, Models, and Reality_, G. Cowan, _et al_ eds.,
Sante Fe Inst. Proc. volume XIX, 1994, p 237.

;-)

========================
Brian Harper |
Associate Professor | "It is not certain that all is uncertain,
Applied Mechanics | to the glory of skepticism" -- Pascal
Ohio State University |
========================