| View previous topic :: View next topic |
| Author |
Message |
Venatyr Guest
|
Posted: Wed Aug 27, 2003 3:00 am Post subject: Trying to determine compression algorithm. Probably LZ varia |
|
|
Hello,
This is my first foray into compression and I was wondering if somebody could provide some help.
I have two files. One file is the original and the other is compressed with an unknown algorithm
(Probably an LZ variant).
The files are available to be downloaded from this forum thread:
http://talus.hopto.org/bored/index.php?act=ST&f=13&t=106
Thanks,
Venatyr |
|
| |
|
Back to top |
Stuart Caie Guest
|
Posted: Wed Aug 27, 2003 4:53 am Post subject: Re: Trying to determine compression algorithm. Probably LZ v |
|
|
Venatyr wrote:
[quote]Hello,
This is my first foray into compression and I was wondering if somebody could provide some help.
I have two files. One file is the original and the other is compressed with an unknown algorithm
(Probably an LZ variant).
The files are available to be downloaded from this forum thread:
http://talus.hopto.org/bored/index.php?act=ST&f=13&t=106
[/quote]
It>s LZSS.
It>s a stream of blocks, where each block starts with a control byte. This
byte tells us what the next 8 elements will be, through whether its bits are
set or not. The order the bits are tested are from LSB to MSB. If a bit is
set, the element is a match (2 bytes). If not, it>s a literal (1 byte).
The literal is just that, a single literal byte.
The match is encoded like so:
- first byte is the distance to go back in the stream (in bytes)
to be at the start of the match.
- second byte is the length of the match, minus 2,
shifted left by 4 bits.
Distance can be 255 at most. Length can be 17 at most.
Here>s a hand decoding of the start of your japanese compressed file:
control=00 (LLLLLLLL: all literals)
83 65 83 43 83 8B 83 74
control=00 (LLLLLLLL: all literals)
83 42 83 93 83 4F 8E 63
control=00 (LLLLLLLL: all literals)
82 E8 82 51 89 F1 0D 0A
control=0B (MMLMLLLL: two matches, literal, match, 4 literals)
18 F0: match distance=24, length=17
18 00: match distance=24, length=2
50
18 F0: match distance=24, length=17
4F 8F C1 96
control=02 (LMLLLLLL)
C5
14 00: match distance=20, length=2
8D 55 8C 82 28 41
control=02 (LMLLLLLL)
29
1D 10: match distance=29, length=3
51 81 5B 83 80 82
Let us know once you have a decoder that gets this far and post it. You may
find the matches come out wrong once you reach a certain point in the
stream. I don>t know where this is, because it depends on what size the
sliding window is. I also may be wrong about match offsets and lengths, as I
only bothered decoding as far as you see above.
Regards
Stuart |
|
| |
|
Back to top |
Venatyr Guest
|
Posted: Wed Aug 27, 2003 9:04 pm Post subject: Re: Trying to determine compression algorithm. Probably LZ v |
|
|
Stuart Caie <kyzer@4u.net> wrote in
news:3f4bf2f7$0$251$cc9e4d1f@news.dial.pipex.com:
[quote]Venatyr wrote:
Hello,
This is my first foray into compression and I was wondering if
somebody could provide some help.
I have two files. One file is the original and the other is
compressed with an unknown algorithm (Probably an LZ variant).
The files are available to be downloaded from this forum thread:
http://talus.hopto.org/bored/index.php?act=ST&f=13&t=106
It>s LZSS.
It>s a stream of blocks, where each block starts with a control byte.
This byte tells us what the next 8 elements will be, through whether
its bits are set or not. The order the bits are tested are from LSB to
MSB. If a bit is set, the element is a match (2 bytes). If not, it>s a
literal (1 byte).
The literal is just that, a single literal byte.
The match is encoded like so:
- first byte is the distance to go back in the stream (in bytes)
to be at the start of the match.
- second byte is the length of the match, minus 2,
shifted left by 4 bits.
Distance can be 255 at most. Length can be 17 at most.
Here>s a hand decoding of the start of your japanese compressed file:
control=00 (LLLLLLLL: all literals)
83 65 83 43 83 8B 83 74
control=00 (LLLLLLLL: all literals)
83 42 83 93 83 4F 8E 63
control=00 (LLLLLLLL: all literals)
82 E8 82 51 89 F1 0D 0A
control=0B (MMLMLLLL: two matches, literal, match, 4 literals)
18 F0: match distance=24, length=17
18 00: match distance=24, length=2
50
18 F0: match distance=24, length=17
4F 8F C1 96
control=02 (LMLLLLLL)
C5
14 00: match distance=20, length=2
8D 55 8C 82 28 41
control=02 (LMLLLLLL)
29
1D 10: match distance=29, length=3
51 81 5B 83 80 82
Let us know once you have a decoder that gets this far and post it.
You may find the matches come out wrong once you reach a certain point
in the stream. I don>t know where this is, because it depends on what
size the sliding window is. I also may be wrong about match offsets
and lengths, as I only bothered decoding as far as you see above.
Regards
Stuart
[/quote]
Great. Thanks for you help.
After researching different LZ variants, I was thinking to myself that it
was LZSS.
--
Venatyr |
|
| |
|
Back to top |
Venatyr Guest
|
Posted: Fri Aug 29, 2003 8:32 pm Post subject: Re: Trying to determine compression algorithm. Probably LZ v |
|
|
Venatyr <darkfire7@yourmojo.com> wrote in
news:Xns93E46683465A1darkfire77starlok77@24.71.223.159:
[quote]Stuart Caie <kyzer@4u.net> wrote in
news:3f4bf2f7$0$251$cc9e4d1f@news.dial.pipex.com:
Venatyr wrote:
Hello,
This is my first foray into compression and I was wondering if
somebody could provide some help.
I have two files. One file is the original and the other is
compressed with an unknown algorithm (Probably an LZ variant).
The files are available to be downloaded from this forum thread:
http://talus.hopto.org/bored/index.php?act=ST&f=13&t=106
It>s LZSS.
It>s a stream of blocks, where each block starts with a control byte.
This byte tells us what the next 8 elements will be, through whether
its bits are set or not. The order the bits are tested are from LSB
to MSB. If a bit is set, the element is a match (2 bytes). If not,
it>s a literal (1 byte).
The literal is just that, a single literal byte.
The match is encoded like so:
- first byte is the distance to go back in the stream (in bytes)
to be at the start of the match.
- second byte is the length of the match, minus 2,
shifted left by 4 bits.
Distance can be 255 at most. Length can be 17 at most.
Here>s a hand decoding of the start of your japanese compressed file:
control=00 (LLLLLLLL: all literals)
83 65 83 43 83 8B 83 74
control=00 (LLLLLLLL: all literals)
83 42 83 93 83 4F 8E 63
control=00 (LLLLLLLL: all literals)
82 E8 82 51 89 F1 0D 0A
control=0B (MMLMLLLL: two matches, literal, match, 4 literals)
18 F0: match distance=24, length=17
18 00: match distance=24, length=2
50
18 F0: match distance=24, length=17
4F 8F C1 96
control=02 (LMLLLLLL)
C5
14 00: match distance=20, length=2
8D 55 8C 82 28 41
control=02 (LMLLLLLL)
29
1D 10: match distance=29, length=3
51 81 5B 83 80 82
Let us know once you have a decoder that gets this far and post it.
You may find the matches come out wrong once you reach a certain
point in the stream. I don>t know where this is, because it depends
on what size the sliding window is. I also may be wrong about match
offsets and lengths, as I only bothered decoding as far as you see
above.
Regards
Stuart
Great. Thanks for you help.
After researching different LZ variants, I was thinking to myself that
it was LZSS.
--
Venatyr
[/quote]
I managed to decompress it properly.
It is using 12bits for distance and 4bits for length.
Thanks for your help Stuart.
--
Venatyr |
|
| |
|
Back to top |
|