| View previous topic :: View next topic |
| Author |
Message |
BHARATH Guest
|
Posted: Tue Sep 09, 2003 2:57 pm Post subject: WHAT ARE THE VARIOUS METHODS FOR IMPLEMENTING BWT? |
|
|
HELLO... IAM BHARATH DOING A PROJECT ON BWT.I HAVE REFERRED BWT
APPROACHES BY MARK NELSON,CAMPOS,TSAI HSING AND ITOH TANAKA.AMONG
THOSE APPROACHES WHICH ONE IS THE BEST FOR IMPLEMENTING BWT,IS THEIR
ANY OTHER APPROACH WHICH IS BEST FOR IMPLEMENTING BWT?,IN TERMS OF
TIME AND SPACE COMPLEXITY. |
|
| |
|
Back to top |
Ken Willets Guest
|
Posted: Wed Sep 10, 2003 4:08 am Post subject: Re: WHAT ARE THE VARIOUS METHODS FOR IMPLEMENTING BWT? |
|
|
In article <57dda274.0309090157.7de4fbb8@posting.google.com>, "BHARATH"
<bharathkj@hotmail.com> wrote:
[quote]APPROACHES WHICH ONE IS THE BEST FOR IMPLEMENTING BWT,IS THEIR ANY OTHER
APPROACH WHICH IS BEST FOR IMPLEMENTING BWT?,IN TERMS OF TIME AND SPACE
COMPLEXITY.
[/quote]
Yes, the lowercase approach :-)
This is a worthwhile question, and I>ve been thinking about a project
to go through the papers and compile a table of results, with time
complexity, and actual space consumption, eg 4n, 5n, etc.
For the uninitiated, block sorting space is usually scored as:
pointer: 4 bytes
int: 4 bytes
char: 1 byte
Some of the algorithms I know are:
Method time O() space notes
suffix tree n 10-20n pointer-intensive
strcmp sort n^2 log n 5n quicksort of n pointers to n chars
itoh tanaka n^2 log n 5n worst case; depends on LCP
seward n^2 log n 5n worst case; depends on LCP
deep-shallow n^2 log n 5n ferragina + manzini, LCP-dependent
qsfusort n log n 8n best published n log n
LCP = longest common prefix when comparing two strings. Usually the
extra n in n^2 log n comes from character-by-character comparison of
strings, which can take up to n comparisons. |
|
| |
|
Back to top |
michael a maniscalco Guest
|
Posted: Wed Sep 10, 2003 4:26 pm Post subject: Re: WHAT ARE THE VARIOUS METHODS FOR IMPLEMENTING BWT? |
|
|
Hi Ken,
You forgot to list radix methods.
I>ve placed an example at
http://www.geocities.com/m99datacompression/RadixBWT.htm
Also 5n (plus stack)
You note qsfusort as best published n log n.
Do you know where one might find an implementation for comparisons?
-Michael A Maniscalco
"Ken Willets" <usenet@willets.org> wrote in message news:<D3t7b.769$Rg6.616@newssvr29.news.prodigy.com>...
[quote]In article <57dda274.0309090157.7de4fbb8@posting.google.com>, "BHARATH"
bharathkj@hotmail.com> wrote:
APPROACHES WHICH ONE IS THE BEST FOR IMPLEMENTING BWT,IS THEIR ANY OTHER
APPROACH WHICH IS BEST FOR IMPLEMENTING BWT?,IN TERMS OF TIME AND SPACE
COMPLEXITY.
Yes, the lowercase approach :-)
This is a worthwhile question, and I>ve been thinking about a project
to go through the papers and compile a table of results, with time
complexity, and actual space consumption, eg 4n, 5n, etc.
For the uninitiated, block sorting space is usually scored as:
pointer: 4 bytes
int: 4 bytes
char: 1 byte
Some of the algorithms I know are:
Method time O() space notes
suffix tree n 10-20n pointer-intensive
strcmp sort n^2 log n 5n quicksort of n pointers to n chars
itoh tanaka n^2 log n 5n worst case; depends on LCP
seward n^2 log n 5n worst case; depends on LCP
deep-shallow n^2 log n 5n ferragina + manzini, LCP-dependent
qsfusort n log n 8n best published n log n
LCP = longest common prefix when comparing two strings. Usually the
extra n in n^2 log n comes from character-by-character comparison of
strings, which can take up to n comparisons.[/quote] |
|
| |
|
Back to top |
KW Guest
|
Posted: Wed Sep 10, 2003 10:08 pm Post subject: Re: WHAT ARE THE VARIOUS METHODS FOR IMPLEMENTING BWT? |
|
|
In article <dfd7ecdb.0309100326.4731f2f7@posting.google.com>, "michael a
maniscalco" <michael@m99datacompression.com> wrote:
[quote]Hi Ken,
You forgot to list radix methods.
I>ve placed an example at
http://www.geocities.com/m99datacompression/RadixBWT.htm Also 5n (plus
stack)
[/quote]
Feel free to add entries; I was hoping simply to get a list started.
[quote]
You note qsfusort as best published n log n. Do you know where one might
find an implementation for comparisons?
[/quote]
Oops, that should qsufsort - try http://www.larsson.dogma.net/research.html
for source.
.. Also, my source for most of these was Ferragina and Manzini>s paper on
the deep-shallow algorithm:
http://citeseer.nj.nec.com/567924.html
There are other papers which have similar information. |
|
| |
|
Back to top |
michael a maniscalco Guest
|
Posted: Thu Sep 11, 2003 7:16 pm Post subject: Re: WHAT ARE THE VARIOUS METHODS FOR IMPLEMENTING BWT? |
|
|
Perhaps it would be a good idea to create a site dedicated to
comparing the time and space complexities of various BWT variants.
Note, not to focus on the compression results as they are often
improved but nothing more than filters, but rather to make available
the source code and perhaps a demo exe for whatever variants of the
calculation of the BWT string that might be out there.
I would be willing to create and maintain such a site if the people
who have written BWT engines wish to do so. It>s just an idea but
I would think that categorizing each by the primary algorithm, plus
source code, plus a robust set of test files and a list of
memory usage and time would be very helpful to people. It would be
a test of the calculation of the BWT string such that the resulting
string is the same size as the original and with the same count of
each symbol. That is, the 'pure' BWT transformed string.
Anyone have suggestions on the idea?
-Michael A Maniscalco
"KW" <usenet@willets.org> wrote in message news:<pUI7b.2948$7P5.2167@newssvr25.news.prodigy.com>...
[quote]In article <dfd7ecdb.0309100326.4731f2f7@posting.google.com>, "michael a
maniscalco" <michael@m99datacompression.com> wrote:
Hi Ken,
You forgot to list radix methods.
I>ve placed an example at
http://www.geocities.com/m99datacompression/RadixBWT.htm Also 5n (plus
stack)
Feel free to add entries; I was hoping simply to get a list started.
You note qsfusort as best published n log n. Do you know where one might
find an implementation for comparisons?
Oops, that should qsufsort - try http://www.larsson.dogma.net/research.html
for source.
. Also, my source for most of these was Ferragina and Manzini>s paper on
the deep-shallow algorithm:
http://citeseer.nj.nec.com/567924.html
There are other papers which have similar information.[/quote] |
|
| |
|
Back to top |
Ken Willets Guest
|
Posted: Fri Sep 12, 2003 1:48 am Post subject: Re: WHAT ARE THE VARIOUS METHODS FOR IMPLEMENTING BWT? |
|
|
In article <dfd7ecdb.0309110616.7f29e855@posting.google.com>, "michael a
maniscalco" <michael@m99datacompression.com> wrote:
[quote]Perhaps it would be a good idea to create a site dedicated to comparing
the time and space complexities of various BWT variants. Note, not to
focus on the compression results as they are often improved but nothing
more than filters, but rather to make available the source code and
perhaps a demo exe for whatever variants of the calculation of the BWT
string that might be out there.
Yes, there are a lot of variants out there. I just found a version that I[/quote]
thought I had invented, written up as a suffix array construction
algorithm, without mentioning block sorting.
There are a few differences I>ve found in BWT filters. One is the use of
a terminator ("$") character, and another is the lexical position of $
either before or after the rest of the alphabet. Another is the removal
of $ from the output, which throws everything past it off by 1. Any of
these can play havoc with debugging.
We might try dogma.net, or datacompression.info, as they have extensive
BWT information. I could put it on my site as well. |
|
| |
|
Back to top |
|