www.GetXFactor.com

Leading Technology, Science,
Agriculture News and information


Part of the Identityscape.com network...

getxfactor.com jmoodmusic.com smartbusinesschoices.com mintdepot.com lowfaresalways.com evangelicalview.com shoppingpodder.com soproudlywehail.com webnews.ws currenthumor.com

 

 

Advice on selecting text compression library
   Science and Technology news... Forum Index -> Compression Forum  
View previous topic :: View next topic  
Author Message
Chris
Guest






PostPosted: Sat Jul 26, 2008 11:15 pm    Post subject: Advice on selecting text compression library Reply with quote

My app needs to compress and decompress large amounts of plain text very
quickly. The text is usually English, sometimes non-English in UTF8.

Can anyone point me to the best text compression library to use?

Here are the requirements:

1. Must be accessible from Java.

2. Text *decompression* speed is the most important thing to optimize
for. We>ll compress only once, but decompress many times. The amount of
compression we get is important, but we>re willing to trade size for speed.

3. The blocks of text to compress will range from 100 bytes to 100K,
with an average ~10K.

I>m currently using the java.util.zip library built in to Java, and it
works ok, but a profiler shows that a large percentage of processor time
is spent in decompression. Is there a library that can do significantly
better?
Back to top
Mark Nelson
Guest






PostPosted: Sun Jul 27, 2008 3:53 pm    Post subject: Re: Advice on selecting text compression library Reply with quote

On Jul 26, 1:15 pm, Chris <spam_me_...@goaway.com> wrote:
[quote]My app needs to compress and decompress large amounts of plain text very
quickly. The text is usually English, sometimes non-English in UTF8.
[...]
I>m currently using the java.util.zip library built in to Java, and it
works ok, but a profiler shows that a large percentage of processor time
is spent in decompression. Is there a library that can do significantly
better?
[/quote]
I suppose it>s always possible that your native methods are
inefficient on your platform, but in general java.util.zip should
perform very well in terms of speed. I think that the overall
acceptance of this format shows that it provides a balance between
speed and compression that is pretty comfortable for most people. Last
time I studied it the zip classes were implemented on top of zlib, so
they are not using Java but rather the target system>s C compiler.

Note that Zip is somewhat asymmetrical - the compressor has to perform
a search for matching strings at the current compression point,
whereas the decompressor simply has to follow an index back to the
matching string.

Likewise, the compressor has to build a Huffman table and encode using
it, the decompressor doesn>t have to go through the build process
(although I suspect this is not a big deal.)

Anyway, the point is that decompression should be fairly fast, and you
will be hard pressed to find something a lot faster. You can do some
parameter tuning in an attempt to speed things up, but I think that in
many cases it will affect the compressor more than the decompressor.

Markus Oberhumer makes the LZO library which is reputed to be very
fast, although people often have trouble with licensing. I don>t know
if it has Java bindings, you may have to do that yourself:

http://en.wikipedia.org/wiki/Lempel-Ziv-Oberhumer

If you look at some compression benchmarks you can look at the faster
compressors and see if any of them are available in lib format:

http://cs.fit.edu/~mmahoney/compression/text.html

It>s worth noting that the zip compressors show up as having very fast
decompression times in Matt>s benchmark, but LZO does manage to beat
them. (At some cost to compression ratios.)

|
| Mark Nelson - http://marknelson.us
|
Back to top
Display posts from previous:   
   Science and Technology news... Forum Index -> Compression Forum  
Page 1 of 1
All times are GMT

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum