www.GetXFactor.com

Leading Technology, Science,
Agriculture News and information


Part of the Identityscape.com network...

getxfactor.com jmoodmusic.com smartbusinesschoices.com mintdepot.com lowfaresalways.com evangelicalview.com shoppingpodder.com soproudlywehail.com webnews.ws currenthumor.com

 

 

HOW TO SUFFIX ARRAY CONSTRUCTION IS USED IN BWT EFFICIENTLY?
   Science and Technology news... Forum Index -> Compression Forum  
View previous topic :: View next topic  
Author Message
BHARATH
Guest






PostPosted: Fri Sep 12, 2003 6:51 pm    Post subject: HOW TO SUFFIX ARRAY CONSTRUCTION IS USED IN BWT EFFICIENTLY? Reply with quote

HELLO ... I AM BHARATH DOING A PROJECT ON BWT ... I KNOW TO CONSTRUCT
SUFFIX ARRAY ... BUT I DON>T GET THE IDEA OF HOW SUFFIX ARRAY
CONSTRUCTION IS USED IN BWT... CAN ANYBODY SAY HOW IT IS DONE WITH
EXAMPLES... OR IS THEIR ANY SITE WHICH HAS A COMPLETE INFORMATION ON
HOW SUFFIX ARRAY IS CONSTRUCTED AND HOW IT IS USED IN BWT APPROACH.
BYE BHARATH
Back to top
Ken Willets
Guest






PostPosted: Sun Sep 14, 2003 12:13 am    Post subject: Re: HOW TO SUFFIX ARRAY CONSTRUCTION IS USED IN BWT EFFICIEN Reply with quote

In article <57dda274.0309120551.3fcd5f1f@posting.google.com>, "BHARATH"
<bharathkj@hotmail.com> wrote:

If you list out the block-sorted matrix M, eg

$abracadabra
a$abracadabr
abra$abracad
abracadabra$
acadabra$abr
adabra$abrac
bra$abracada
bracadabra$a
cadabra@abra
dabra$abraca
ra$abracadab
racadabra@ab
^ ^
F L

(Here, $ < a, and using $ ensures that M is sorted by suffixes only )

You can draw arrows from each row to its immediate suffix, eg "abracadbra$
--> "bracadabra$a", giving each row a link in the form of the row number
of its suffix. This creates an array Psi which is the permutation that
would result if M were shifted one column to the left and re-sorted.

Psi is typically used in reversing the BWT to get the original text (it>s
easily calculated from L). If you just store the first character of each
row i (column F, or just character counts), and Psi[i], you can find
successive characters of a row by reading the first character of i, going
to Psi[i] to find the second character, so on, iterating i = Psi[i] to
cycle through all n suffixes.

Psi can be converted to a suffix array by starting at the row
corresponding to the original string (the primary index), and iterating
the same way, replacing Psi[i] with 0,1,2... as you move through the text.

Psi has a useful property in that it is increasing on each range of rows
in M which have the same first character, that is if rows i and j start
with "a", say, and i < j, then Psi[i] < Psi[j]. The reason is simple: if
the row at i is lexicographically less than the one at j, and they have
the same first character, then string comparison depends on their 2nd or
further characters, meaning that their suffixes must be in the same order,
etc.

Psi is compressible because consecutive values often differ by one; this
distribution is used in Distance Coding and Inversion Frequencies BWT
compression methods.

It>s also fairly easy to build Psi character-by-character, creating a
block-sorted array in n log n time and very small space. I thought
I was the only one who had thought of this until I read this paper:

* T. W. Lam, K. Sadakane, W. K. Sung and S. M. Yiu: A Space and Time
Efficient Algorithm for Constructing Compressed Suffix Arrays, Proc. of
COCOON, LNCS 2387, pp. 401--410, 2002. ps.gz file (66875 bytes).

This is a good description of Psi and the insertion method.
Back to top
Display posts from previous:   
   Science and Technology news... Forum Index -> Compression Forum  
Page 1 of 1
All times are GMT

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum