Where the information comes from.

Loren Haarsma (lhaarsma@ursa.calvin.edu)
Fri, 27 Aug 1999 15:00:55 -0400 (EDT)

Thanks to the people who replied to my earlier post. I sent that post
to several e-mail lists, and I received replies from several different
sources. Rather than compose a different reply for each e-mail list,
I'll be efficient by writing one general-purpose follow-up for all.

--

Cog the robot learns how to point at an object. Computer programs learn how to navigate mazes. In each case, their success requires that the control algorithms increase their stored information. Where does the information come from?

Here is my one-sentence, over-simplified answer: "The information comes from the environment."

Caveat #1: Cog and the maze-solving programs are very bad analogies to abiogenesis. Caveat #2: "Information," in this case, requires both a long string of variables and specificity for performing a task. Caveat #3: It is over-simplified to say that the information "comes from the environment." Caveat #4: The information isn't created _ex_nihilo_. Caveat #5: It's only "information" because we say it is. Caveat #6: OK, you can get information transfer from the environment into an information-storing algorithm by using a simple, iterative, evolutionary process; however, that doesn't prove that all modern lifeforms evolved that way. Caveat #7: Cog and the maze-solving programs were themselves intelligently designed and crafted. Caveat #8: Cog and the maze-solving programs aren't doing anything novel. They're just doing the same old thing better and better.

I'll discuss each caveat in turn. I hope they don't bore you. Personally, I think that each caveat is interesting, opening up a fascinating area of discussion and maybe even scientific research.

In other words, this post is going to be long and somewhat technical. :-)

If the issues of self-organized complexity and biological information don't interest you, I doubt if this post will interest you either.

----------------------------

Caveat #1: Cog and the maze-solving programs are very bad analogies to abiogenesis.

True. I wasn't addressing how the first living cells came into existence. Rather, I was addressing these questions: Given an ecosystem filled with lifeforms with short genomes, can biological evolution bring about lifeforms with much longer genomes? If so, where does that "extra information" come from?

I'll return to abiogenesis at the end of this post. But for now, please forget about abiogenesis. For now, I'm only interested in biological evolution which increases lengths of EXISTING genomes.

-------------

Caveat #2: "Information," in this case, requires both a long string of variables AND specificity for performing a task.

Some people use the term "information" in such a way that a purely random string has the highest information content. This is a fine definition of information for SOME applications; for example, when you're using the term "information" to discuss how efficiently a symbol set can be encoded. But that's not my usage here.

By "information," I mean something very much like Dembski's twin criteria of low-probability and specification. In my examples, there were two ways in which "information" increased. (1) For the maze-solving programs, "increasing information" meant that the maze-solver's instruction string got longer and longer. The instruction string started out short and "specified" for solving the maze. As the instruction string got longer, it increasingly became "low probability" compared to random strings.(2) For Cog the robot, "increasing information" did not mean that Cog's variable set got longer. Cog has just as many variables in memory before as after learning the task. Rather, Cog's variable set got better and better specified to perform its task. (George Andrews phrased this concept a different way in his asa@calvin post: the variables which are successful for pointing are a subset of all possible random variables; the "solution space" for pointing is selected out of the large, unspecified initial variable space.)

So we can increase information by (1) increasing specificity or (2) increasing the size of a specified data set. In some cases, this distinction is not an important. For example, I could have made the maze-solving program start out with a very long, randomized "control string." The program would simply ignore all but the beginning portion of its control string, and over-write the random part of the control string as it gets better and better at solving the maze. In this case, the control string would not get longer, just better and better specified.

So it's not quite correct to say that information increases every time a "control string" gets longer. Specificity must also increase.

Why am I belaboring this point? Because it is a point of confusion in debates over biological evolution. Some evolutionists say that mutations cause genomic information to increase. Other evolutionists say that natural selection causes genomic information to increase. How can both claims be true? Are these evolutionists just hopelessly confused? No, they're just looking at opposite sides of the same coin. Mutations change the values of the genome's "control string." Mutations can also increase the length of the control string. Natural selection is the feedback which increases specificity of the control string. So mutation without natural selection doesn't really increase biological information. Natural selection without novel mutations doesn't really increase biological information. But together, they are capable of doing the job, in ways analogous to the maze-finding program.

------------------

Caveat #3: It is over-simplified to say that the information "comes from the environment."

The control string of the maze-solving programs provides information about how to navigate a particular maze, but it is information ONLY within the context of an algorithm which reads and interprets the control string. In the same way, the variables in Cog's memory provide information about how to point at an object, but only within the context of Cog's construction (the length of its arms, the angle of its joints, the strength of its servo motors, etc). Cog's variables and the maze-finding control strings are "information about the environment" only within the context of an algorithm.

In the same way, nucleotide sequences inside a cell provide the cell with information about how to survive in its environment, but it is "survival information" only within the context of a living cell which "reads" that information and acts upon it.

(Richard Kouchoo made essentially this same point in his asa@calvin post.)

But that's still not the whole story.

There are many different control strings which can solve a particular maze. There are many different variable sets which can allow Cog to point. Out of all possible successful strings or variable sets, why does one PARTICULAR set get chosen? The answer is: historical contingency. In some sense, the maze-solvers' control string encodes not just information about its environment, but also information about its history. Cog's variables encode not just information about its environment, but also information about the historical pathway it took towards success.

And of course, the same thing happens in biology. There are many different nucleotide sequences which could encode functionally identical proteins. Why does a species have just one (or a few) particular alleles for that protein? Historical contingency.

So, an information-storing algorithm can increase its stored information when the algorithm repeatedly follows a list of simple instructions, the length of the instruction list can grow, the instruction list can be altered, and the "success" of an instruction list is determined by feedback from a complicated environment. The information comes from the environment. It is only "information" about the environment within the context of its algorithm. And if multiple kinds of "success" are possible, the instruction list will encode information not only about the environment, but also about the particular evolutionary pathway taken historically.

--------------------

Caveat #4: The information isn't created _ex_nihilo_.

The system which encodes information about the environment must expend energy to gain the information. The system's information about the environment can grow, its information cannot (in some sense) exceed the total information in the environment.

One respondent (The response was not posted to an open email list, so I won't use his name here) called this process "information transfer" in order to distinguish it from "information creation." I agree. Information is transferred from the environment.

Another such respondent argued thus: "The information content of the final functional sequence could be no greater than the information content of the routines that did the selecting." I think this statement is correct or incorrect depending upon what one means by "the routines that did the selecting." I think this claim is INCORRECT if one means the control algorithm separated from the environment. I think this claim is CORRECT if one includes the environment as part of "the routines that did the selecting." The maze-solving algorithm, minus its control string and minus its environment (an actual maze), could be encoded quite efficiently (in much less than 10 kilobytes, I'm sure). On the other hand, the instruction string can become arbitrarily long for a complex maze. However, the environment (the maze itself) will contain more information than the instruction string.

Almost always. I can imagine a situation in which the instruction string contains more information than the information required to construct the maze, because the instruction string will contain "historical information" in addition to "environmental information." However, pursuing this will open a can of worms about what we mean by the term "information," and it's not relevant to biological evolution, so let's not get into it. As far as biology is concerned, the environment contains a whole lot more information than a genome can encode.

---------------

Caveat #5: It's only "information" because we say it is.

True, sort of.

Consider the maze-solving program. From the standpoint of computer binary code, there's nothing special about the bits which specify the "control string" compared to the bits which specify the rest of the code. From the standpoint of computer hardware, it's all just "LO" or "HI" TTL voltages in transistors.

In the same way, there's nothing special about the electrons and protons in DNA compared to all the other electrons and protons in the rest of the cell. Transcribing DNA is just a chemical reaction. It's all just chemical reactions.

And yet. And yet from our perspective, those particular chemical compounds (or those particular logic states) perform a function which we perceive as stereotyped, elegant, and critical for function. We find it extremely useful to think about DNA (or control strings) as "information." So we develop an academic discipline called "information theory." We apply information theory to biology because we find it helps us build conceptual models and make predictions. We find that it helps us build useful analogies between biological evolution and computer simulations -- useful so long as we remember that they *are* analogies.

Isn't it cool? So it's no disrespect to say, "It's only information because we say it is." Instead, it reveals conceptual connections between widely separated fields of study (biology and computer science). What an amazing creation we live in, where such conceptual connections are possible!

------------------

Caveat #6: OK, you can get information transfer from the environment into an information-storing algorithm by using a simple, interactive, evolutionary process; however, that doesn't prove that all modern lifeforms evolved that way.

True. Of course.

I'm not saying that these examples prove biological evolution. Instead, I am addressing a particular sweeping claim which some people have made in the past few years. I am addressing the sweeping claim that natural laws or regularities simply cannot do the job of increasing biological information. I think that claim is incorrect -- incorrect as a sweeping claim. Now, it may be true in PARTICULAR cases. It may be true that evolution cannot account for certain particular biological complexities. It may be true that evolution cannot account for MANY particular biological complexities. We need more scientific data to decide that issue. And we'll eventually get it. As we sequence the genomes of more species, as we learn more about cellular physiology, and as we build better models of population genetics, we'll slowly find the answers to those questions about particular complexities. But in the mean time, we should discard those sweeping, general claims that biological evolution cannot IN PRINCIPLE produce large increases in genomic information. In principle, it can. In practice -- when we look at particular examples of biological complexity -- we shall see as more data comes in.

Before moving on the next caveat, it's probably worth tossing out two biological examples where this evolutionary "increasing information" probably happened. [1] In previous postings, I have discussed how "homomeric" ion channels (which are an assembly of four or five identical copies of the same protein, made from the same gene) could evolve into "heteromeric" ion channels (which are an assembly of four or five different proteins made from different genes) via the mechanisms of gene duplication and differentiation. [2] Terry Gray has discussed how hemoglobin could have evolved from globin (http://asa.calvin.edu/evolution/irred_compl.html) in a similar way. Since these examples have been discussed elsewhere, I won't spend more time on them here. Again, these are only particular examples. I'm not claiming that they prove that all modern lifeforms evolved that way.

Another private respondent claimed, "Because [biological] mutation rates have to be low, the time required for transmission of sufficient information [from the environment] to cause evolution is longer that what is available."

Maybe. Maybe not. This is a hand-wavy argument. Hand-wavy arguments are useful in science. They are suggestive, but they are not convincing. To be convinced one way or the other, I'll need to see some solid numbers and some quantitative models. Those numbers and those models will need to be scrutinized by peer review. Until then, such broad evolutionary claims are neither proven nor disproved. I have no objection if you say, "In my opinion, biological change happened too fast for evolution." I have no objection if you say, "In my opinion as a scientist with years of expertise studying the subject, biological change happened too fast for evolution." But if you say, "Biological change definitely DID happen too fast for evolution," I will object. I will say, "Show me the numbers, AND show me the peer review to back them up, please." Incidentally, I will make the same very same objection to someone who claims, "All biological change definitely happened by evolution."

--------------

Caveat #7: Cog and the maze-solving programs were themselves intelligently designed and crafted.

Five of my six respondents made this point. And of course I agree.

(/BEGIN RANT)Actually, these respondents did not say that Cog and the computer programs were "intelligently designed and crafted." They merely said that Cog and the computer programs were "intelligently designed." But I'm sure they meant to say that Cog and the programs were both "intelligently designed AND CRAFTED."

It is important to distinguish amongst all designed objects, the subset of objects which are both designed and crafted, and the subset of objects which were designed for self-assembly. I believe that the Intelligent Design movement is doing itself a profound disservice by repeatedly choosing NOT to make these distinctions. By choosing to imply that "intelligent design" MUST include the concept of "crafting," they are implying that so many gloriously interesting parts of creation (atoms, galaxies, stars, planets with oceans and atmospheres and simple organic molecules) are NOT, in fact, designed, since God chose to produce those things via self-assembly. Surely the ID movement doesn't want to imply that atoms, galaxies, stars, planets, and organic molecules are not designed! But that is exactly what they are doing by their rhetorical choices. Well, I'm not going to help them shoot themselves in the foot (or maybe in the head) that way. When I mean designed, I'll say designed; when I mean designed-and-crafted, I'll say designed-and-crafted.(/END RANT)

So Cog and the computer programs are designed and crafted. So what? Well, although they didn't say so explicitly, I'm guessing that at least four of those five respondents intended to imply something further. I think they intended to imply something like this: "You can only get information transfer from the environment into an information-storing algorithm IF the device which runs the algorithm was itself designed and crafted."

Now that's an interesting claim, worthy of examination. Maybe it's true. Maybe it isn't. Consider the counter-claim: "Under certain conditions, it is possible for a device to self-assemble which will, once assembled, run an information-storing algorithm which can take and store information about the environment." This claim, too, is worthy of examination.

And that brings us to the final two caveats:

--------------

Caveat #8: Cog and the maze-solving programs aren't doing anything novel. They're just doing the same old thing better and better.

Caveat #9 = Caveat #1: Cog and the maze-solving programs are very bad analogies to abiogenesis.

To my surprise, caveat #8 wasn't made by any of my respondents. But it's worth looking at anyway. Caveat #1 was stated especially well by Charles Carrigan on the science-and-Christianity list.

Both of those caveats are addressed by the following statement:There are multiple strategies for self-organizing complexity and increasing information.

That's such a neat concept, that I hope you won't mind if I repeat it. There are multiple strategies for self-organizing complexity and increasing information. A strategy which works in one situation (e.g. increasing genomic lengths) might not work in another situation (e.g. abiogenesis), and vice-versa.

First let's consider the question of evolving novelty (caveat #8). Cog the robot and the maze-finding program illustrate one strategy for increasing information without novelty: an information-storing algorithm in which the instruction set can be altered, the instruction set can grow in length, and "success" is determined by feedback from the environment. That strategy is enough to get information transfer from the environment, but it's not enough for evolution of novelty. In order to get evolution of novelty, you need two additional elements. (1) Each "instruction" must be able to serve multiple functions, depending upon context. (2) "Success" must be measured as a complex function of multiple variables.

With those two additional elements, it is possible, under certain circumstances, to get evolution of novelty. (There may be other strategies to evolve novelty, too; but this is one strategy which we know works sometimes.) The artificial life "Tierra" program gives one example of this strategy. (http://www.hip.atr.co.jp/~ray/pubs/tierra/tierrahtml.html) In biology, there's some pretty good indication that the Creb's cycle first appeared this way.(http://asa.calvin.edu/archive/evolution/199610/0029.html)

Now for caveat #1. So far, I've talked about strategies for self-organized complexity where you're already starting with something that's pretty complex. But what if you start out with nothing but simple things? What about abiogenesis?

It turns out that there are strategies for self-organized complexity which start out with nothing but simple objects. One such strategy is the Fine-Tuning strategy. Simple objects interact with each other in stochastic processes, and the rules which govern their interaction are finely tuned to encourage the eventual development of complex (even irreducibly complex) objects. Some time ago I posted a computer-simulation example of this strategy (http://www.calvin.edu/archive/asa/199804/0285.html). Other examples of this strategy are all around us --- atoms, galaxies, stars, and planets. Scientists have good evidence that this fine-tuning strategy goes far enough to result in simple organic molecules on the early, pre-biotic earth -- molecules both synthesized on earth and delivered via comets.

How far can this fine-tuning strategy go? Can it go all the way to living cells with DNA? I think it would be premature to say "no," and extremely premature to say "yes." It could very well be that this strategy only gets so far, and other strategies are needed to go the rest of the way up to cells with DNA. One strategy that might hold promise is this: If you start with independent agents which can cooperate and specialize, and this cooperating and specialization leads to increased success, then complexity forms spontaneously. (One recent publication on this strategy is in Physical Review Letters v.82 n.25 p.5144, Barbara Drossel, "Simple Model for the Formation of a Complex Organism." Also, Glenn Morton has pointed out in previous postings that the development of a complex inter-dependent economic system provides a nice analogy to this strategy.) Would this strategy work in chemical evolution or protocells? If so, how? I'm keeping my eyes and ears open for developments in those field. Maybe I'm being overly optimistic. We'll see. I'm at that the limit of what I'm willing to speculate here, except to say the following.

Personally, I suspect that God did use self-organizing complexity (rather than miraculous fiat) to make the first living cells. I figure that if God went to all the trouble to create self-organizing strategies to take creation all the way from fundamental particles up to organic molecules on a hospitable prebiotic planet, and if God went to all the trouble to create self-organizing strategies into the mechanisms of biological evolution of living cells, then it's reasonable (though by no means certain) to guess that God created self-organizing strategies of abiogenesis to get from one to the other.

======

Loren Haarsma