Re: biology has belatedly realised that it is, itself, an information technology

Stephen Jones (sejones@ibm.net)
Tue, 29 Jun 1999 05:56:47 +0800

Reflectorites

Here is an article from The Economist at:

http://www.economist.com/editorial/freeforall/current/index_st2340.html

which points out that "biology has belatedly realised that it is, itself, an
information technology."

I like this excerpt:

"The main reason for this shotgun marriage with information technology is
that biology has belatedly realised that it is, itself, an information
technology- even though the technologist is natural selection rather than
Bill Gates. An organism's physiology and behaviour are dictated largely by
its genes. And those genes are merely repositories of information written in
a surprisingly similar manner to the one that computer scientists have
devised for the storage and transmission of other information-that is,
digitally."

According to my son Brad, who is in his final year of his degree in
Information Technology Engineering (which as it may be expected has a lot
of Information Theory in its content), it is axiomatic in Information Theory
that information has an intelligent source. Biology might be unwittingly
letting a Trojan Horse in the door!

Maybe Dobzhansky's famous statement: "Nothing in biology makes sense
except in the light of evolution" will have to be revised to: "Nothing in
biology makes sense except in the light of information"? And since
information always has an intelligent source, it will need to be
eventually revised to: "Nothing in biology makes sense except in the light
of Intelligent Design"!

Steve

------------------------------------------------------------------
The Economist, June 26-July 2, 1999

Drowning in data

Like so many others, biologists are confronted by a tidal wave of
information. Unfortunately, few of them know how to swim

[...]

ONCE upon a time, biology was simple. Its practitioners cultivated things
in Petri dishes and flowerpots, or studied them through fieldglasses. They
might count them, measure their lengths, or even weigh them. But the
numbers- and the crunching needed to interpret those numbers-rarely taxed
their mathematical skills beyond a level that they would have learned at
school. That is, however, changing fast. Biological data are flooding in at
an unprecedented rate. The amount of information stored, for example, in
the international repository of genetic sequences known as GenBank is
doubling every 14 months. As a result, many of the challenges in biology,
from gene analysis to drug discovery, have actually become challenges in
computing. Indeed, the process of change is so rapid that some of the
subject's potentates are afraid that progress may grind to a halt unless a
huge injection of numeracy takes place pretty soon.

The mightiest of those potentates inhabit America's National Institutes of
Health (NIH)-the body responsible for disbursing the lion's share of federal
money available for biomedical research. And earlier this month the NIH
issued a report that talked of "the alarming gap between the need for
computation in biology and the skills and resources available to meet that
need" and recommended spending up to $160m on rectifying matters
through a network of biocomputing centres across the country.

An embarrassment of riches

The main reason for this shotgun marriage with information technology is
that biology has belatedly realised that it is, itself, an information
technology- even though the technologist is natural selection rather than
Bill Gates. An organism's physiology and behaviour are dictated largely by
its genes. And those genes are merely repositories of information written in
a surprisingly similar manner to the one that computer scientists have
devised for the storage and transmission of other information-that is,
digitally. There are superficial differences, of course. The genetic code has
four elements (the four so-called bases, sometimes referred to as its
letters), rather than the two of a binary coding system. And the bases are
grouped together in threes, known as codons, rather than in the eight-bit
bytes of computing. But the similarities are more striking, so the subject is
suddenly lending itself to a serious amount of computerisation.

At the same time, there has been rapid progress in the machines that supply
the raw material-the sequences of genetic letters and codons in
chromosomes. A single high-throughput gene-sequencing machine can now
read hundreds of thousands of bases per day; and newer technologies, such
as "gene chips", should make the analysis even faster. That will produce
even more data that have to be stored and annotated for subsequent study.
And even for those who do not work directly on the genes themselves,
similar technological changes are appearing. Robotic screening machines,
for example, in which hundreds of compounds in tiny wells are tested to
see if they react with a particular biological target, can analyse thousands of
compounds in a day.

The result is a mind-boggling amount of information. According to
Anthony Kerlavage of Celera, a company formed last year with the
intention of sequencing the entire human genome using private money (and
beating government-financed projects in the process), a genetics laboratory
can easily produce 100 gigabytes of data a day-that is about 20,000 times
the volume of data in the complete works of Shakespeare or J. S. Bach.

The analysis of such data poses problems beyond mere volume control.
Having sequenced a particular piece of DNA, for example, it is useful to
compare it with a central database (such as GenBank) of existing sequences
to see what it resembles. But this requires more than just a straightforward
database search. The program involved must know what constitutes a
biologically meaningful resemblance, and it must also be able to deal with
the errors that inevitably creep into the sequencing process. As a result,
devising new search algorithms requires extensive knowledge of computing
theory, together with a keen biological intuition.

And there's the rub. The real problem about the growing quantification of
biology is not the change in the subject but the lack of change in its
practitioners. For a sudden inpouring of data is not unique to biology.
Astronomers, who once squinted over photographic plates, now deal with
squillions of bits of data from automatic sky surveys. Meteorologists no
longer use seaweed; instead, they prefer supercomputers. Particle physicists
would not have the first idea of what was going on in their machines if the
results of their experiments were not processed automatically. Yet none of
those fields seems to be suffering unduly from information overload
because the physical sciences are founded on number-crunching.
Astronomers, for example, have been using rooms full of computers ever
since the days when the word "computer" referred to a skilled
mathematician. And some of the first electronic computers were devised
specifically for use by physicists working on the development of atomic
weapons.

Many biologists, however, avoided the fields of astronomy, meteorology or
particle physics precisely because they have, in the delicately chosen words
of Sylvia Spengler of the Centre for Bioinformatics and Computational
Genomics at the Lawrence Berkeley National Laboratory in California,
"some problem with mathematics".

The result, according to Larry Hunter, president of the International
Society of Computational Biology, is that there is a desperate shortage of
specialists capable of developing the computational tools that biologists
need. What is required, he says, is "a genuinely new kind of scientist" who
is trained in both computer science and biology. Worryingly, however, the
demand for computational biologists is such that the very academics
needed to teach interdisciplinary courses that might plug the gap are going
into industry, where their skills are more highly remunerated.

Some physical scientists used to accuse innumerate biologists of "physics
envy". Partly, the accusation was that they secretly envied a numerical
rigour to which they could not possibly aspire. Partly, it was that physicists
got all the money. Now, however, it is the biologists' budgets that are
growing. But there is a price. As biology becomes numerically rigorous, its
practitioners have no choice but to do the same.

[...]
------------------------------------------------------------------

--------------------------------------------------------------------
"Biology is the study of complicated things that give the appearance of
having been designed for a purpose." (Dawkins R., "The Blind
Watchmaker," [1986], Penguin: London, 1991, reprint, p1)
--------------------------------------------------------------------