| View previous topic :: View next topic |
| Author |
Message |
Doug Wedel Guest
|
Posted: Mon Jul 14, 2008 6:05 am Post subject: Question about the Shannon "entropy" of genomes |
|
|
Using Claude Shannon>s formulas for measuring the redundancy of symbol
tokens in message strings , and given a large enough text to work with, it
is possible to identify the language of a text simply from the statistical
analysis of token use alone, since all languages have unique "signatures" of
redundancy in symbol token use. It strikes me as possible that different
organisms (or species or genuses) may also have characteristic redundancy
levels in their genome, and I was wondering if anyone knows of statistical
studies of this kind. |
|
| |
|
Back to top |
Steven Sullivan Guest
|
Posted: Tue Jul 15, 2008 5:16 am Post subject: Re: Question about the Shannon "entropy" of genomes |
|
|
Doug Wedel <dougwedel@earthlink.net> wrote:
[quote]Using Claude Shannon>s formulas for measuring the redundancy of symbol
tokens in message strings , and given a large enough text to work with, it
is possible to identify the language of a text simply from the statistical
analysis of token use alone, since all languages have unique "signatures" of
redundancy in symbol token use. It strikes me as possible that different
organisms (or species or genuses) may also have characteristic redundancy
levels in their genome, and I was wondering if anyone knows of statistical
studies of this kind.
[/quote]
look up 'codon bias' for one level of redundancy
Also look up 'sequence logos', Tom Schneider s work primarily, which have been used for years
to represent DNA/protein sequence in terms of Shannon Entropy.
http://www-lmmb.ncifcrf.gov/~toms/
--
-S
A wise man, therefore, proportions his belief to the evidence. -- David Hume, "On Miracles"
(1748) |
|
| |
|
Back to top |
Graham Jones Guest
|
Posted: Tue Jul 15, 2008 8:03 pm Post subject: Re: Question about the Shannon "entropy" of genomes |
|
|
"Doug Wedel" <dougwedel@earthlink.net> wrote in message
news:g5eqau$1oak$1@darwin.ediacara.org...
[quote]Using Claude Shannon>s formulas for measuring the redundancy of symbol
tokens in message strings , and given a large enough text to work with, it
is possible to identify the language of a text simply from the statistical
analysis of token use alone, since all languages have unique "signatures"
of
redundancy in symbol token use. It strikes me as possible that different
organisms (or species or genuses) may also have characteristic redundancy
levels in their genome, and I was wondering if anyone knows of statistical
studies of this kind.
[/quote]
Three search terms you may find useful:
Codon usage bias
GC-content
puffer-fish junk-dna
Graham |
|
| |
|
Back to top |
|