| View previous topic :: View next topic |
| Author |
Message |
mcjason Guest
|
Posted: Fri Jul 25, 2008 8:20 am Post subject: Re: compression type |
|
|
On Jul 25, 4:10 am, Willem <wil...@stack.nl> wrote:
[quote]mcjason wrote:
) <a lot of nonsense, repeated over and over
I notice you have chosen to completely ignore my posts about how
'compression' and 'redundancy reduction' are two phrases for the same
thing.
Furthermore, I notice that must of your replies are in the line of
'but my method is completely different' followed by the umpteenth
explanation of 'your method', without acrtually addressing the points
that are made. You sound like a broken record.
Are you trolling, or just plain stupid ? My vote is on trolling.
(The bit about 'random' kinda gives it away.)
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I>m not paranoid. You all think I>m paranoid, don>t you !
#EOT
[/quote]
It>s a thinking point to see how I might be right about something.
but if I said the same thing over and over again it would be to say
once the way that might work, but it also might be to say it
backwards, forwards, from the middle out, and from the end to
beginning, over and over again a different way, that might make a
major point smaller to see.
I was being anywhere with saying that, but it organizes a few ways
together like it can be found better than saying part then the rest as
one part but many rests, to say part and rest many ways with other
parts where each part is only once the same, but never is part of
another part the in more than one part, because of how one part is
about being better on it>s own but with some of another part anyways.
So for any part to be seen once, is for it to not be part of a part
that>s together as seperate parts because they come with a bigger part
that is seen on it>s own. It takes so few parts to say alot, because
they come together as with other parts to be many parts together like
many parts together is just better than some of a part in some of
another part, because it>s a part together better than apart. |
|
| |
|
Back to top |
Willem Guest
|
Posted: Fri Jul 25, 2008 8:28 am Post subject: Re: compression type |
|
|
mcjason wrote:
) It>s a thinking point to see how I might be right about something.
You>re working from the assumption that you>re right,
and everybody else is wrong.
Yet, in your posts you demonstrate that you don>t know anything
about existing compression techniques.
SaSes , Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I>m not paranoid. You all think I>m paranoid, don>t you !
#EOT |
|
| |
|
Back to top |
mcjason Guest
|
Posted: Fri Jul 25, 2008 8:28 am Post subject: Re: compression type |
|
|
On Jul 23, 1:37 pm, Jim Leonard <MobyGa...@gmail.com> wrote:
[quote]On Jul 23, 12:14 am, mcjason <mcja...@gmail.com> wrote:
and i bet though that what>s compressed is able to be compressed again
the same way.. i mean, there should be any reason why there isn>t
patterns to find like this in how you say curved lines and sphere
areas...
We>re done here. I think your next course of action is to stop
ranting and program an LZ77 compressor so that you gain actual
experience writing a compressor. Here>s a few links to help you get
started:
http://datacompression.dogma.net/index.php?title=FAQ:Intro_to_Data_Co....
http://www.fadden.com/techmisc/hdc/index.htm
Read these completely, and if you don>t understand the LZ77 portions,
find a different hobby.
[/quote]
I can put everything you said into a smaller program and make it run
to say the same if it were trying to be the simplest way to program a
rejection letter servant with no manners and takes only a keyword as a
hint, and it would also serve the purpose of answering any post that
tries to be better than there isn>t to think about.
does that compress you? I found it saying alot more than one thing. |
|
| |
|
Back to top |
mcjason Guest
|
Posted: Fri Jul 25, 2008 9:15 am Post subject: Re: compression method |
|
|
On Jul 22, 7:30 am, Mark Nelson <snorkel...@gmail.com> wrote:
[quote]On Jul 21, 2:39 am, Thomas Richter <t...@math.tu-berlin.de> wrote:
mcja...@gmail.com schrieb:
for who says random can>t be compressed...
Ok, here>s an exercise for you, or rather a question:
Please give a definition of "random". (I>m asking for not more).
Then, once you have that definition, we can work from there backwards.
I think a good definition for the purpose of discussion here is
something like this:
"A random sequence is defined as a any sequence which cannot be
generated with a program shorter than itself. "
The only catch to this definition is that it is with respect to the
machine on which the program is going to run. Other than that I think
it works very well.
It also stops any discussion of compressing random data dead in its
tracks by defining the problem away.
[/quote]
so should you, because it never gets far.
[quote]|
| Mark Nelson -http://marknelson.us
|
[/quote]
I see an opposite idea here...
so any random data no matter what, can be alternatively represented by
a program that runs to generate it.
but it>s not to say random is compressable to the size of that problem
looked at another way?
I get it, I get why that might be well said, because there>s nothing
to say about random to make it smaller.
But it>s always true that there>s a way to represent random data as
the program that runs to make it.
I know, try another entropy source that can>t be made into a program
that runs the same way.
But my point is this....
it seems the only point there tries to be about how random can>t be
compresses, is because of it>s trend of showing no repeat occurances
as what would be called redundancy, in any way that>s worthwhile.
that>s to say that the only idea there can be of compression is
redundancy reduction, and even to use tokens.
ok.. to stoopify that idea, what if I did this...
say before lots of data I say with a math that expands many locations
and offsets for each token found afterwards, then I say the rest of
the data as just.
so...
dsfkhnsdkjf^TOKEN1sdfljsd^TOKEN2
neato...
how about
what if I could say a line that draws straight, but spikes at
different heights as it>s drawn, in math?
what if the math to say that line is small enough to say a line like
that which is smaller than many tokens?
so what if I put that before data that wasn>t compressed, and say the
spike means which offset and the distance between spikes is where
there>s a token?
of course this could exist, but so it doesn>t that I know of and that
matters little.
isn>t like random tries to be an example of how compression was made
to not work, like, it says exactly what opposes the fashion of
compressing in a way that tries nothing smarter than exactly the only
way to do anything at all.
it>s so well to think random is to not find recurring trend in a long
length of data, because redundancy reduction is exactly oppsite the
way it>s meant to work.
find in random data so long what exhausts the option of tokens to use,
say strings that recur with a token, but find a token to be unique in
identifying an explanation for something else.
explanation isn>t even the right word, it>s just dead dumb about
saying what goes here is something else, because here is special about
not being the same as anything else.
I would want explanation to be the word that meant what a token could
do.. it could explain what goes there instead of just pointing it out.
like, for block of data said with a token, the token can be what tries
to carry on ongoing changing trend with tokens before even to be
better.
so say in once place a token that says something is a way, then make
the next token what carry>s on how to be different than what the token
before says, and so on.
but make it what the token examins. ok fine.. that>s probably done,
that>s the only way to think of it next.
so a token can say for what was there before, what is there now as
what is different than last time... but it>s how it>s different as
data that hasn>t even been tried but can work in idea.
see how that can get random? because i can say one place is "jbgka"
with a token, but i can next token, as the token itself, what says the
difference to make about "jbgka" to be say "89gffg"
ok who cares... it>s so many ideas that are 'mathematically distant',
and it>s so close to a right idea to examine random in it>s fancy way
of being as what is 'impossible to compress' because it carries no
recurring trend as the only thing trying to reduce a recurring trend
finds hard. like, it>s to say exactly random is what redundancy
reduction tries not to speak for.
so lets see such a conclusion reached about how because random has no
recurring trend, and compression tries to find a recurring trend, that
random can 't be compressed. but some wildcats out there have
recursion as the answer.
so it>s like to say try every small idea of what has a mathematical
expanse to say bigger information for what math can be said smaller
than the information it is.
so it>s to beat with a hammer a math forumula that expands to the
information wanting to be said smaller. like there>s just no other
math known to do this stunt, but can that ever work too... because i
know of such a thing as small math forumulas that say more. it>s like,
fractals and stuff, but then it>s like knowing how to make any fractal
there can be, but then it>s like saying data is a fractal to see it
another way, but then it>s like having a math forumla be what comes
out to be a fractal that is smaller than the fractal is itself at
explaining information, so it>s like, hard to find a math formula for
this or something... so lets jump to a conclusion there and say random
isn>t compressable, because it has no recurring trend, because it
doesn>t find repeat occurances of data often, because that>s the
simplest idea of compression, because that>s all there is.
I like my shape analagy, it serves well at being simple about proving
something. you can actually find it right if you want to, because
every way to say right and wrong is to find a formula that can work
some anyways and making a shape changed to another shape, and the
weakest way too. it>s no recurring trend to not find that to work in
random, except how bad the math is anyways. |
|
| |
|
Back to top |
mcjason Guest
|
Posted: Fri Jul 25, 2008 9:20 am Post subject: Re: compression type |
|
|
On Jul 20, 12:46 am, Jim Leonard <MobyGa...@gmail.com> wrote:
[quote]On Jul 19, 3:38 pm, mcja...@gmail.com wrote:
now many curves to make up sentences with a small token for each.
It doesn>t matter how you>re representing the relationship between the
words, it>s all the same thing.
Typical LZ77 compression already does what you>re describing. The
"curves" are a series of codes that describe where in the dictionary
the next "words" come from.
Your idea is not new, other than the fact that it would take up more
data than necessary to "point" to other words than existing methods.
[/quote]
if LZ77 had this idea it would be the same as what I>m talking
about.....
a token that gets bigger to say more... but say the token says to
start at a match, but then the rest of the token says something
like...
start 12 move up 1, move left 2, match, move up 3, move down right 1,
match
and that>s to build the whole match
where a token that says to go another way matches some of the same but
different for the direction it takes.
that would be like the same idea. |
|
| |
|
Back to top |
mcjason Guest
|
Posted: Fri Jul 25, 2008 9:55 am Post subject: Re: compression type |
|
|
On Jul 25, 5:05 am, Thomas Richter <t...@math.tu-berlin.de> wrote:
[quote]mcjason schrieb:
Second point above. Please state what "random" means. You haven>t done
so yet. Please do your homework - it>s really about helping you, not
about annoying you. Nobody can do that for you, you must learn it yourself.
data where the trend tends to be few repeat occurances of a length of
data, where it>s usually not a worthwhile tradeoff to say one
occurance of what repeats, for there to be a token, for how tokens
have a limited way of being said for what else is said. Beause in
random data the allocation space for a token is usually too exhausted
for
there to be a worthwhile way of saying what a token is for what else
is said, for how a repeat occurance of a length of data can be said
once with a token otherwise.
Not a very reasonable definition, but for the time being, let>s take
this. According to this definition, the following string
1234567891012131415161718191202122232425262728293031323334353637383940...
is random, (nothing repeats, provably) though still a ten-year old can
see its construction algorithm.
Hint: You seem to believe that "random" is an attribute that you can a
apply to a sequence you can point at. "Random" is the property of a
process, not of a specific string in particular. Depending on the
process, the string
1111111111111111111111111111111111111111111111111111111111111....
is as likely as the above.
I understand perfectly why this can be seen as a problem when it comes
to compressing with the technique of saying what repeats once with a
token for other occurances.
I>m not saying this. *You* say this.
It>s intuititive to think of this the way
the problem is well described. But I can>t find anywhere the say so of
random being hard to compress isn>t connected with the idea of only
working the way that repeat occurances are made fewer, with tokens
taking a naming allocation.
It>s very limited to think that>s the only way to compress, I gave A
PERFECT analagy of how this is VERY WRONG.
*Sigh* You gave a non-working example. What makes you believe that I
think in "patterns"? I don>t. My field is *image compression*, yet you
can compress them even though there are no patterns, and the algorithms
used there do not look for matched patterns. Hence, please do not try to
tell me what I do and do not know - I think it>s the time for you to
deepen your research.
it>s to say this proves how random is compressable, take it whatever
way you want I know it>s right.
Using a definition of "random" that makes sense (your definition
doesn>t, I wouldn>t call either of the strings random), you cannot
compress random strings.
say for every length of data there can be a shape, a shape where it>s
a shape different for everyway the data is different.
given perfect math it would be a shape the same size as the data,
because of that making a different shape for everyway data is
different.
That>s a "data model"; the question is "is this data model" reasonable
to compress data? And the answer is: For every model one can construct
data that cannot be successfully modeled by it (IOW, cannot be
compressed, using an optimal entropy coding algorithm on the output of
the model). In your case, the model would be to draw shapes or curves or
spheres. As long as you don>t give better arguments as why you believe
the model you have is good, and for which type of data it is good for,
this is a lost attempt.
What you don>t seem to realize is that while it is fairly true that more
complex models can describe more complex data, these models *also*
require more modeling parameters you somehow have to encode as part of
the message. It is a trade-off between simplicity of the model against
the size of the model parameters. Choosing a simple pattern repetition
model (as in LZ77) leaves only few model parameters (length and offset),
but it is only sufficient to match patterns exactly (from the past) and
not to describe sequences with a more complicated construction algorithm
(as the one I gave above). You can surely introduces models that do that
better, but then you also need more parameters.
In the end, you>ll never have an algorithm that "perfectly compresses
everything" because even though your model is then very complete, it is
so complicated that you need to transmit too much data just to describe
it. You *cannot* win this game, it>s a logical constraint about maps
between finite sets, a very elementary one.
now say for two lengths of data, a shape for each.
now.. this might be a little harder to believe is right.
I>m not arguing at this level - you don>t seem to understand.
given a shape, and another shape, there is math to say the shape but
made different, to the other shape, where the math to say one shape
different to the other shape is smaller than the other shape. So
instead of saying two shapes, say one shape and the math to make the
shape different as the other shape.
All very well, but you still need data to describe this "different", and
you>ll soon find out (once you would dare to try to implement it) that
the overall byte budget required to describe this "different" is higher
than the byte budget you save by using this model, at least for *most* data.
If you don>t believe this, I urge you to implement your idea in an
algorithm and observe this yourself. Depending on the data set, the most
successful models are simple.
given a perfect idea of how this would work, shouldn>t it be that the
math has a 50% rightful claim of being smaller than the other shape,
and a 50% rightful claim of being bigger than the other shape?
Shouldn>t it though just to think of the most idea condition there
should be?
doesn>t that make sense when there could be some math smaller to say
one shape made to be changed is another shape, smaller than the other
shape? and some math bigger than the other shape? shouldn>t the idea
round off as a 50/50 of smaller and bigger than the other shape? to
say a shape changed is another shape.
It all makes sense to say so, but your algorithm also has to say so,
namely has to communicate this to the decoder. And *that* is where your
problem is.
Again, if you don>t believe me, construct this algorithm and you>ll see
yourself.
So long,
Thomas
[/quote]
I have an easy time believing one thing....
say for all there is to compress... put it in a geometry area.
now say it>s just that.
now the file is just that, and 1 token to say that>s what expands, is
just the block there in the geometry area.
so nothing different about the size really.
now.. instead of one block, this instead...
find every instance of BBBB, and seperate the block.
so in
"sdfjl44tn98324jbBBBB098wutjk0982kjaerjtjkbBBBBsejh2348095bb23ybyBBBB2hi2u553vb23bnjfngBBBB"
now say one BB block and the blocks before and after each BB
now the geometry area is with that
now one curved line as the token to draw that pattern.
so lost is every occurance of BBBB except one, so 6 bytes lost.
gained is what it took to say more blocks, and a curved line that
might be slightly bigger but not much?
so the tradeoff of finding a data block of _ANY SIZE_ that has
occurances of BB, like in any size this can happen once in a while.
no pigeonhole concept here because tokens aren>t mixed with data, it>s
the geometry area and curves outside it as all there is to expect.
to say seperate blocks there might as well be the simplest way....
say one block after another, but make it so one block after another is
at a location starting different like it is to say a spiral starting
at the center, but one that a curve
can always find it>s way through easily maybe?
see how this proves random is compressable?
because in random data any size it>s good to see BBBB once in a while,
but it>s only a curve slightly more complicated and saying blocks like
before and after each BBBB... but for what there is to say about size
being bigger, it>s to say a seperate block and a curve slightly more
complicated for each time BBBB is found?
it>s like.. easy to see maybe? |
|
| |
|
Back to top |
mcjason Guest
|
Posted: Fri Jul 25, 2008 10:27 am Post subject: random compression proven? |
|
|
On Jul 25, 5:55 am, mcjason <mcja...@gmail.com> wrote:
[quote]On Jul 25, 5:05 am, Thomas Richter <t...@math.tu-berlin.de> wrote:
mcjason schrieb:
Second point above. Please state what "random" means. You haven>t done
so yet. Please do your homework - it>s really about helping you, not
about annoying you. Nobody can do that for you, you must learn it yourself.
data where the trend tends to be few repeat occurances of a length of
data, where it>s usually not a worthwhile tradeoff to say one
occurance of what repeats, for there to be a token, for how tokens
have a limited way of being said for what else is said. Beause in
random data the allocation space for a token is usually too exhausted
for
there to be a worthwhile way of saying what a token is for what else
is said, for how a repeat occurance of a length of data can be said
once with a token otherwise.
Not a very reasonable definition, but for the time being, let>s take
this. According to this definition, the following string
1234567891012131415161718191202122232425262728293031323334353637383940....
is random, (nothing repeats, provably) though still a ten-year old can
see its construction algorithm.
Hint: You seem to believe that "random" is an attribute that you can a
apply to a sequence you can point at. "Random" is the property of a
process, not of a specific string in particular. Depending on the
process, the string
1111111111111111111111111111111111111111111111111111111111111....
is as likely as the above.
I understand perfectly why this can be seen as a problem when it comes
to compressing with the technique of saying what repeats once with a
token for other occurances.
I>m not saying this. *You* say this.
It>s intuititive to think of this the way
the problem is well described. But I can>t find anywhere the say so of
random being hard to compress isn>t connected with the idea of only
working the way that repeat occurances are made fewer, with tokens
taking a naming allocation.
It>s very limited to think that>s the only way to compress, I gave A
PERFECT analagy of how this is VERY WRONG.
*Sigh* You gave a non-working example. What makes you believe that I
think in "patterns"? I don>t. My field is *image compression*, yet you
can compress them even though there are no patterns, and the algorithms
used there do not look for matched patterns. Hence, please do not try to
tell me what I do and do not know - I think it>s the time for you to
deepen your research.
it>s to say this proves how random is compressable, take it whatever
way you want I know it>s right.
Using a definition of "random" that makes sense (your definition
doesn>t, I wouldn>t call either of the strings random), you cannot
compress random strings.
say for every length of data there can be a shape, a shape where it>s
a shape different for everyway the data is different.
given perfect math it would be a shape the same size as the data,
because of that making a different shape for everyway data is
different.
That>s a "data model"; the question is "is this data model" reasonable
to compress data? And the answer is: For every model one can construct
data that cannot be successfully modeled by it (IOW, cannot be
compressed, using an optimal entropy coding algorithm on the output of
the model). In your case, the model would be to draw shapes or curves or
spheres. As long as you don>t give better arguments as why you believe
the model you have is good, and for which type of data it is good for,
this is a lost attempt.
What you don>t seem to realize is that while it is fairly true that more
complex models can describe more complex data, these models *also*
require more modeling parameters you somehow have to encode as part of
the message. It is a trade-off between simplicity of the model against
the size of the model parameters. Choosing a simple pattern repetition
model (as in LZ77) leaves only few model parameters (length and offset),
but it is only sufficient to match patterns exactly (from the past) and
not to describe sequences with a more complicated construction algorithm
(as the one I gave above). You can surely introduces models that do that
better, but then you also need more parameters.
In the end, you>ll never have an algorithm that "perfectly compresses
everything" because even though your model is then very complete, it is
so complicated that you need to transmit too much data just to describe
it. You *cannot* win this game, it>s a logical constraint about maps
between finite sets, a very elementary one.
now say for two lengths of data, a shape for each.
now.. this might be a little harder to believe is right.
I>m not arguing at this level - you don>t seem to understand.
given a shape, and another shape, there is math to say the shape but
made different, to the other shape, where the math to say one shape
different to the other shape is smaller than the other shape. So
instead of saying two shapes, say one shape and the math to make the
shape different as the other shape.
All very well, but you still need data to describe this "different", and
you>ll soon find out (once you would dare to try to implement it) that
the overall byte budget required to describe this "different" is higher
than the byte budget you save by using this model, at least for *most* data.
If you don>t believe this, I urge you to implement your idea in an
algorithm and observe this yourself. Depending on the data set, the most
successful models are simple.
given a perfect idea of how this would work, shouldn>t it be that the
math has a 50% rightful claim of being smaller than the other shape,
and a 50% rightful claim of being bigger than the other shape?
Shouldn>t it though just to think of the most idea condition there
should be?
doesn>t that make sense when there could be some math smaller to say
one shape made to be changed is another shape, smaller than the other
shape? and some math bigger than the other shape? shouldn>t the idea
round off as a 50/50 of smaller and bigger than the other shape? to
say a shape changed is another shape.
It all makes sense to say so, but your algorithm also has to say so,
namely has to communicate this to the decoder. And *that* is where your
problem is.
Again, if you don>t believe me, construct this algorithm and you>ll see
yourself.
So long,
Thomas
I have an easy time believing one thing....
say for all there is to compress... put it in a geometry area.
now say it>s just that.
now the file is just that, and 1 token to say that>s what expands, is
just the block there in the geometry area.
so nothing different about the size really.
now.. instead of one block, this instead...
find every instance of BBBB, and seperate the block.
so in
"sdfjl44tn98324jbBBBB098wutjk0982kjaerjtjkbBBBBsejh2348095bb23ybyBBBB2hi2u553vb23bnjfngBBBB"
now say one BB block and the blocks before and after each BB
now the geometry area is with that
now one curved line as the token to draw that pattern.
so lost is every occurance of BBBB except one, so 6 bytes lost.
gained is what it took to say more blocks, and a curved line that
might be slightly bigger but not much?
so the tradeoff of finding a data block of _ANY SIZE_ that has
occurances of BB, like in any size this can happen once in a while.
no pigeonhole concept here because tokens aren>t mixed with data, it>s
the geometry area and curves outside it as all there is to expect.
to say seperate blocks there might as well be the simplest way....
say one block after another, but make it so one block after another is
at a location starting different like it is to say a spiral starting
at the center, but one that a curve
can always find it>s way through easily maybe?
see how this proves random is compressable?
because in random data any size it>s good to see BBBB once in a while,
but it>s only a curve slightly more complicated and saying blocks like
before and after each BBBB... but for what there is to say about size
being bigger, it>s to say a seperate block and a curve slightly more
complicated for each time BBBB is found?
it>s like.. easy to see maybe?- Hide quoted text -
- Show quoted text -
[/quote]
See how I can say this....
in data any length, no matter what....
store in a geometry area, but say no different than the data together
and one token.
so no bigger really....
now say this is what is being compressed...
... any length ... "abcdefghijklmnopqrstuvwxyz efcdab cderfab" ... any
length... "erfab 123456789 da" .,.. any length ...
then it>s to store...
BLOCK, "ab", "cd", "ef", "ghijklmnopqrstuvwxyz", BLOCK, "erf",
"123456789", "da", BLOCK
and one token...
a curved line... BLOCK - "ab" - "cd" - "ef" - "qghijklmnopqrstuvwxyz "
- "ef" - "cd" - "ab" - "cd" - "er" - BLOCK"f" - "ab" - BLOCK -
BLOCK"erf" - "ab" - "123456789 " - BLOCK"da" - BLOCK
so it has to say 14 blocks instead of 1, and a curved line that isn>t
just saying at one place, but is saying through 14 blocks like how
they>re situated.
now that>s to lose 15 bytes, but gained is explaining 14 blocks
instead of one, and gained is a curved more complicated.
so that>s about at odds with saying nothing better.
so what makes this better now?
isn>t it to find that going on forever is to find better than what it
takes to explain a new block and how a curve becomes more complicated
for every
"ab", "cd", "ef", "qghijklmnopqrstuvwxyz ", "f", "erf, "123456789 ",
and "da" found, it>s to say that size less but a block more and a
curve slightly more complicated? |
|
| |
|
Back to top |
mcjason Guest
|
Posted: Fri Jul 25, 2008 10:55 am Post subject: random compression proven |
|
|
On Jul 25, 6:27 am, mcjason <mcja...@gmail.com> wrote:
[quote]On Jul 25, 5:55 am, mcjason <mcja...@gmail.com> wrote:
On Jul 25, 5:05 am, Thomas Richter <t...@math.tu-berlin.de> wrote:
mcjason schrieb:
Second point above. Please state what "random" means. You haven>t done
so yet. Please do your homework - it>s really about helping you, not
about annoying you. Nobody can do that for you, you must learn it yourself.
data where the trend tends to be few repeat occurances of a length of
data, where it>s usually not a worthwhile tradeoff to say one
occurance of what repeats, for there to be a token, for how tokens
have a limited way of being said for what else is said. Beause in
random data the allocation space for a token is usually too exhausted
for
there to be a worthwhile way of saying what a token is for what else
is said, for how a repeat occurance of a length of data can be said
once with a token otherwise.
Not a very reasonable definition, but for the time being, let>s take
this. According to this definition, the following string
1234567891012131415161718191202122232425262728293031323334353637383940...
is random, (nothing repeats, provably) though still a ten-year old can
see its construction algorithm.
Hint: You seem to believe that "random" is an attribute that you can a
apply to a sequence you can point at. "Random" is the property of a
process, not of a specific string in particular. Depending on the
process, the string
1111111111111111111111111111111111111111111111111111111111111....
is as likely as the above.
I understand perfectly why this can be seen as a problem when it comes
to compressing with the technique of saying what repeats once with a
token for other occurances.
I>m not saying this. *You* say this.
It>s intuititive to think of this the way
the problem is well described. But I can>t find anywhere the say so of
random being hard to compress isn>t connected with the idea of only
working the way that repeat occurances are made fewer, with tokens
taking a naming allocation.
It>s very limited to think that>s the only way to compress, I gave A
PERFECT analagy of how this is VERY WRONG.
*Sigh* You gave a non-working example. What makes you believe that I
think in "patterns"? I don>t. My field is *image compression*, yet you
can compress them even though there are no patterns, and the algorithms
used there do not look for matched patterns. Hence, please do not try to
tell me what I do and do not know - I think it>s the time for you to
deepen your research.
it>s to say this proves how random is compressable, take it whatever
way you want I know it>s right.
Using a definition of "random" that makes sense (your definition
doesn>t, I wouldn>t call either of the strings random), you cannot
compress random strings.
say for every length of data there can be a shape, a shape where it>s
a shape different for everyway the data is different.
given perfect math it would be a shape the same size as the data,
because of that making a different shape for everyway data is
different.
That>s a "data model"; the question is "is this data model" reasonable
to compress data? And the answer is: For every model one can construct
data that cannot be successfully modeled by it (IOW, cannot be
compressed, using an optimal entropy coding algorithm on the output of
the model). In your case, the model would be to draw shapes or curves or
spheres. As long as you don>t give better arguments as why you believe
the model you have is good, and for which type of data it is good for,
this is a lost attempt.
What you don>t seem to realize is that while it is fairly true that more
complex models can describe more complex data, these models *also*
require more modeling parameters you somehow have to encode as part of
the message. It is a trade-off between simplicity of the model against
the size of the model parameters. Choosing a simple pattern repetition
model (as in LZ77) leaves only few model parameters (length and offset),
but it is only sufficient to match patterns exactly (from the past) and
not to describe sequences with a more complicated construction algorithm
(as the one I gave above). You can surely introduces models that do that
better, but then you also need more parameters.
In the end, you>ll never have an algorithm that "perfectly compresses
everything" because even though your model is then very complete, it is
so complicated that you need to transmit too much data just to describe
it. You *cannot* win this game, it>s a logical constraint about maps
between finite sets, a very elementary one.
now say for two lengths of data, a shape for each.
now.. this might be a little harder to believe is right.
I>m not arguing at this level - you don>t seem to understand.
given a shape, and another shape, there is math to say the shape but
made different, to the other shape, where the math to say one shape
different to the other shape is smaller than the other shape. So
instead of saying two shapes, say one shape and the math to make the
shape different as the other shape.
All very well, but you still need data to describe this "different", and
you>ll soon find out (once you would dare to try to implement it) that
the overall byte budget required to describe this "different" is higher
than the byte budget you save by using this model, at least for *most* data.
If you don>t believe this, I urge you to implement your idea in an
algorithm and observe this yourself. Depending on the data set, the most
successful models are simple.
given a perfect idea of how this would work, shouldn>t it be that the
math has a 50% rightful claim of being smaller than the other shape,
and a 50% rightful claim of being bigger than the other shape?
Shouldn>t it though just to think of the most idea condition there
should be?
doesn>t that make sense when there could be some math smaller to say
one shape made to be changed is another shape, smaller than the other
shape? and some math bigger than the other shape? shouldn>t the idea
round off as a 50/50 of smaller and bigger than the other shape? to
say a shape changed is another shape.
It all makes sense to say so, but your algorithm also has to say so,
namely has to communicate this to the decoder. And *that* is where your
problem is.
Again, if you don>t believe me, construct this algorithm and you>ll see
yourself.
So long,
Thomas
I have an easy time believing one thing....
say for all there is to compress... put it in a geometry area.
now say it>s just that.
now the file is just that, and 1 token to say that>s what expands, is
just the block there in the geometry area.
so nothing different about the size really.
now.. instead of one block, this instead...
find every instance of BBBB, and seperate the block.
so in
"sdfjl44tn98324jbBBBB098wutjk0982kjaerjtjkbBBBBsejh2348095bb23ybyBBBB2hi2u553vb23bnjfngBBBB"
now say one BB block and the blocks before and after each BB
now the geometry area is with that
now one curved line as the token to draw that pattern.
so lost is every occurance of BBBB except one, so 6 bytes lost.
gained is what it took to say more blocks, and a curved line that
might be slightly bigger but not much?
so the tradeoff of finding a data block of _ANY SIZE_ that has
occurances of BB, like in any size this can happen once in a while.
no pigeonhole concept here because tokens aren>t mixed with data, it>s
the geometry area and curves outside it as all there is to expect.
to say seperate blocks there might as well be the simplest way....
say one block after another, but make it so one block after another is
at a location starting different like it is to say a spiral starting
at the center, but one that a curve
can always find it>s way through easily maybe?
see how this proves random is compressable?
because in random data any size it>s good to see BBBB once in a while,
but it>s only a curve slightly more complicated and saying blocks like
before and after each BBBB... but for what there is to say about size
being bigger, it>s to say a seperate block and a curve slightly more
complicated for each time BBBB is found?
it>s like.. easy to see maybe?- Hide quoted text -
- Show quoted text -
See how I can say this....
in data any length, no matter what....
store in a geometry area, but say no different than the data together
and one token.
so no bigger really....
now say this is what is being compressed...
... any length ... "abcdefghijklmnopqrstuvwxyz efcdab cderfab" ... any
length... "erfab 123456789 da" .,.. any length ...
then it>s to store...
BLOCK, "ab", "cd", "ef", "ghijklmnopqrstuvwxyz", BLOCK, "erf",
"123456789", "da", BLOCK
and one token...
a curved line... BLOCK - "ab" - "cd" - "ef" - "qghijklmnopqrstuvwxyz "
- "ef" - "cd" - "ab" - "cd" - "er" - BLOCK"f" - "ab" - BLOCK -
BLOCK"erf" - "ab" - "123456789 " - BLOCK"da" - BLOCK
so it has to say 14 blocks instead of 1, and a curved line that isn>t
just saying at one place, but is saying through 14 blocks like how
they>re situated.
now that>s to lose 15 bytes, but gained is explaining 14 blocks
instead of one, and gained is a curved more complicated.
so that>s about at odds with saying nothing better.
so what makes this better now?
isn>t it to find that going on forever is to find better than what it
takes to explain a new block and how a curve becomes more complicated
for every
"ab", "cd", "ef", ...
read more »- Hide quoted text -
- Show quoted text -
[/quote]
did i ever screw that up... hehe
... any amount ... "abcdefghijklmnop" ... "opmnklijghefcdab" ... any
amount
stored as....
BLOCK_BEFORE, "ab", "cd", "ef", gh", "ij", "kl", "mn", "op",
BLOCK_AFTER
so then a curved line BLOCK_BEFORE - "ab" - "cd" - "ef" - "gh" - "ij"
- "kl" - "mn" - "op" - "op" - "mn" - "kl" - "ij" - "gh" - ef" - "cd"
- "ab" - BLOCK_AFTER
so....
stored with block seperation, to say one block after another makes a
spiral say for example but one a curve draws through well.
so...
total size now... each block, as seperated, and a curved line.
16 bytes lost, 10 blocks seperated instead of 1, and a curved line
more complex.
so it>s to say that forever as the size of data, any 2 bytes as found
to be "ab", "cd", "ef", "gh", "ij", or "kl" is for one block
seperation, and a curved line slightly more complicated.
that>s about even right? unless it>s slightly better right?
so now it>s only to find in data of arbitrary length more, 3
characters found together more than once to be at even better odds. |
|
| |
|
Back to top |
mcjason Guest
|
Posted: Fri Jul 25, 2008 11:01 am Post subject: Re: random compression proven |
|
|
On Jul 25, 6:55 am, mcjason <mcja...@gmail.com> wrote:
[quote]On Jul 25, 6:27 am, mcjason <mcja...@gmail.com> wrote:
On Jul 25, 5:55 am, mcjason <mcja...@gmail.com> wrote:
On Jul 25, 5:05 am, Thomas Richter <t...@math.tu-berlin.de> wrote:
mcjason schrieb:
Second point above. Please state what "random" means. You haven>t done
so yet. Please do your homework - it>s really about helping you, not
about annoying you. Nobody can do that for you, you must learn it yourself.
data where the trend tends to be few repeat occurances of a length of
data, where it>s usually not a worthwhile tradeoff to say one
occurance of what repeats, for there to be a token, for how tokens
have a limited way of being said for what else is said. Beause in
random data the allocation space for a token is usually too exhausted
for
there to be a worthwhile way of saying what a token is for what else
is said, for how a repeat occurance of a length of data can be said
once with a token otherwise.
Not a very reasonable definition, but for the time being, let>s take
this. According to this definition, the following string
1234567891012131415161718191202122232425262728293031323334353637383940...
is random, (nothing repeats, provably) though still a ten-year old can
see its construction algorithm.
Hint: You seem to believe that "random" is an attribute that you can a
apply to a sequence you can point at. "Random" is the property of a
process, not of a specific string in particular. Depending on the
process, the string
1111111111111111111111111111111111111111111111111111111111111....
is as likely as the above.
I understand perfectly why this can be seen as a problem when it comes
to compressing with the technique of saying what repeats once with a
token for other occurances.
I>m not saying this. *You* say this.
It>s intuititive to think of this the way
the problem is well described. But I can>t find anywhere the say so of
random being hard to compress isn>t connected with the idea of only
working the way that repeat occurances are made fewer, with tokens
taking a naming allocation.
It>s very limited to think that>s the only way to compress, I gave A
PERFECT analagy of how this is VERY WRONG.
*Sigh* You gave a non-working example. What makes you believe that I
think in "patterns"? I don>t. My field is *image compression*, yet you
can compress them even though there are no patterns, and the algorithms
used there do not look for matched patterns. Hence, please do not try to
tell me what I do and do not know - I think it>s the time for you to
deepen your research.
it>s to say this proves how random is compressable, take it whatever
way you want I know it>s right.
Using a definition of "random" that makes sense (your definition
doesn>t, I wouldn>t call either of the strings random), you cannot
compress random strings.
say for every length of data there can be a shape, a shape where it>s
a shape different for everyway the data is different.
given perfect math it would be a shape the same size as the data,
because of that making a different shape for everyway data is
different.
That>s a "data model"; the question is "is this data model" reasonable
to compress data? And the answer is: For every model one can construct
data that cannot be successfully modeled by it (IOW, cannot be
compressed, using an optimal entropy coding algorithm on the output of
the model). In your case, the model would be to draw shapes or curves or
spheres. As long as you don>t give better arguments as why you believe
the model you have is good, and for which type of data it is good for,
this is a lost attempt.
What you don>t seem to realize is that while it is fairly true that more
complex models can describe more complex data, these models *also*
require more modeling parameters you somehow have to encode as part of
the message. It is a trade-off between simplicity of the model against
the size of the model parameters. Choosing a simple pattern repetition
model (as in LZ77) leaves only few model parameters (length and offset),
but it is only sufficient to match patterns exactly (from the past) and
not to describe sequences with a more complicated construction algorithm
(as the one I gave above). You can surely introduces models that do that
better, but then you also need more parameters.
In the end, you>ll never have an algorithm that "perfectly compresses
everything" because even though your model is then very complete, it is
so complicated that you need to transmit too much data just to describe
it. You *cannot* win this game, it>s a logical constraint about maps
between finite sets, a very elementary one.
now say for two lengths of data, a shape for each.
now.. this might be a little harder to believe is right.
I>m not arguing at this level - you don>t seem to understand.
given a shape, and another shape, there is math to say the shape but
made different, to the other shape, where the math to say one shape
different to the other shape is smaller than the other shape. So
instead of saying two shapes, say one shape and the math to make the
shape different as the other shape.
All very well, but you still need data to describe this "different", and
you>ll soon find out (once you would dare to try to implement it) that
the overall byte budget required to describe this "different" is higher
than the byte budget you save by using this model, at least for *most* data.
If you don>t believe this, I urge you to implement your idea in an
algorithm and observe this yourself. Depending on the data set, the most
successful models are simple.
given a perfect idea of how this would work, shouldn>t it be that the
math has a 50% rightful claim of being smaller than the other shape,
and a 50% rightful claim of being bigger than the other shape?
Shouldn>t it though just to think of the most idea condition there
should be?
doesn>t that make sense when there could be some math smaller to say
one shape made to be changed is another shape, smaller than the other
shape? and some math bigger than the other shape? shouldn>t the idea
round off as a 50/50 of smaller and bigger than the other shape? to
say a shape changed is another shape.
It all makes sense to say so, but your algorithm also has to say so,
namely has to communicate this to the decoder. And *that* is where your
problem is.
Again, if you don>t believe me, construct this algorithm and you>ll see
yourself.
So long,
Thomas
I have an easy time believing one thing....
say for all there is to compress... put it in a geometry area.
now say it>s just that.
now the file is just that, and 1 token to say that>s what expands, is
just the block there in the geometry area.
so nothing different about the size really.
now.. instead of one block, this instead...
find every instance of BBBB, and seperate the block.
so in
"sdfjl44tn98324jbBBBB098wutjk0982kjaerjtjkbBBBBsejh2348095bb23ybyBBBB2hi2u553vb23bnjfngBBBB"
now say one BB block and the blocks before and after each BB
now the geometry area is with that
now one curved line as the token to draw that pattern.
so lost is every occurance of BBBB except one, so 6 bytes lost.
gained is what it took to say more blocks, and a curved line that
might be slightly bigger but not much?
so the tradeoff of finding a data block of _ANY SIZE_ that has
occurances of BB, like in any size this can happen once in a while.
no pigeonhole concept here because tokens aren>t mixed with data, it>s
the geometry area and curves outside it as all there is to expect.
to say seperate blocks there might as well be the simplest way....
say one block after another, but make it so one block after another is
at a location starting different like it is to say a spiral starting
at the center, but one that a curve
can always find it>s way through easily maybe?
see how this proves random is compressable?
because in random data any size it>s good to see BBBB once in a while,
but it>s only a curve slightly more complicated and saying blocks like
before and after each BBBB... but for what there is to say about size
being bigger, it>s to say a seperate block and a curve slightly more
complicated for each time BBBB is found?
it>s like.. easy to see maybe?- Hide quoted text -
- Show quoted text -
See how I can say this....
in data any length, no matter what....
store in a geometry area, but say no different than the data together
and one token.
so no bigger really....
now say this is what is being compressed...
... any length ... "abcdefghijklmnopqrstuvwxyz efcdab cderfab" ... any
length... "erfab 123456789 da" .,.. any length ...
then it>s to store...
BLOCK, "ab", "cd", "ef", "ghijklmnopqrstuvwxyz", BLOCK, "erf",
"123456789", "da", BLOCK
and one token...
a curved line... BLOCK - "ab" - "cd" - "ef" - "qghijklmnopqrstuvwxyz "
- "ef" - "cd" - "ab" - "cd" - "er" - BLOCK"f" - "ab" - BLOCK -
BLOCK"erf" - "ab" - "123456789 " - BLOCK"da" - BLOCK
so it has to say 14 blocks instead of 1, and a curved line that isn>t
just saying at one place, but is saying through 14 blocks like how
they>re situated.
now that>s to lose 15 bytes, but gained is explaining 14 blocks
instead of one, and gained is a curved more complicated.
so that>s about at odds with saying nothing better.
so what makes this better now?
isn>t it to find that going on forever is to find better than what it
takes to explain a new block and how a curve becomes more complicated
for every
"ab", "cd", "ef", ...
read more »- Hide quoted text -
- Show quoted text -
did i ever screw that up... hehe
... any amount ... "abcdefghijklmnop" ... "opmnklijghefcdab" ... any
amount
stored as....
BLOCK_BEFORE, "ab", "cd", "ef", gh", "ij", "kl", "mn", "op",
BLOCK_AFTER
so then a curved line BLOCK_BEFORE - "ab" - "cd" - "ef" - "gh" - "ij"
- "kl" - "mn" - "op" - "op" - "mn" - "kl" - "ij" - "gh" - ef" - "cd"
- "ab" - BLOCK_AFTER
so....
stored with block seperation, to say one block after another makes a
spiral say for example but one a curve draws through well.
so...
total size now... each block, as seperated, and a curved line.
16 bytes lost, 10 blocks seperated instead of 1, and a curved line
more complex.
so it>s to say that forever as the size of data, any 2 bytes as found
to be "ab", "cd", "ef", "gh", "ij", or "kl" is for one block
seperation, and a curved line slightly more complicated.
that>s about even right? unless it>s slightly better right?
so now it>s only to find in data of arbitrary length more, 3
characters found together more than once to be at even better odds.
[/quote]
it>s to say that each block is a plot point in a 3d space, and the
curved line is a spiral say that touches each plot point for how the
pattern organizes. |
|
| |
|
Back to top |
Willem Guest
|
Posted: Fri Jul 25, 2008 11:10 am Post subject: Re: random compression proven |
|
|
mcjason wrote:
) it>s to say that each block is a plot point in a 3d space, and the
) curved line is a spiral say that touches each plot point for how the
) pattern organizes.
How much space is used storing the shape (a spiral say) ?
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I>m not paranoid. You all think I>m paranoid, don>t you !
#EOT |
|
| |
|
Back to top |
mcjason Guest
|
Posted: Fri Jul 25, 2008 1:27 pm Post subject: proof random is compressable |
|
|
On Jul 20, 6:12 pm, mcja...@gmail.com wrote:
[quote]On Jul 20, 5:54 pm, mcja...@gmail.com wrote:
On Jul 20, 5:35 pm, mcja...@gmail.com wrote:
On Jul 20, 3:59 pm, Willem <wil...@stack.nl> wrote:
mcja...@gmail.com wrote:
) what would happen if this though...
If you ignore fundamental principles and simple arguments,
then you will either get laughed at or get ignored.
SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I>m not paranoid. You all think I>m paranoid, don>t you !
#EOT
What am I ignoring that>s fundamental?
I>m taking the understanding into account that compression works with
the idea of reducing redundancy...
so far the only idea I think is repeat occurances that can be said
once and explained more often right?
what a way to achieve the reduction of information... but I wouldn>t
say the only way, it>s just said so the way to be about it.
I was trying to not be far from an idea that says differerent, would
work in idea of thinking about it, and has something else to it when
it comes to what proportion can be achieved in how much information
can be reduced.
now think of a string of text... find the string of text said another
way as a shape somehow, where every word there can be would draw a
different shape. ok ?
now find another string of text, find the shape for it, now find math
that transforms one shape to another, find that in some cases the math
to transform the first shape into the second shape is smaller than the
second shape itself... so say this now, hold the first shape and the
math to transform the shape as the information...
so now in idea it>s compression not working for the idea of repeat
occurances, but for how a shape is math transform in size bigger or
smaller than another shape.
so not like there>s any math for the idea, or how even any example
tries to fare, it>s just the idea of how it>s working to achieve
compression.
see how that>s completely different than finding repeat occurances of
even the same string?
see how it doesn>t even depend on how many repeat occurances can be
found?
so in simplicity of the same proportion I think this idea of
compression would work, I don>t get stuck thinking of it anyway...
like... say for everytime abc cba or bca is found as part of the file,
you say coordinate in area and a curve where you start at the first
letter and the curve follows through across each letter. so now only
the letters "abc" are in the geometry area, but a token that says the
letters rearranged any way.
so that achieves another way besides repeat occurances of the same
string.
I think files say alot better about rearranged patterns than repeat
occurances.. and _no matter what_ it>s doing exactly redundancy
reductiotion the same as repeat occurances is too, it definitely says
that _at least_, but could only be better.
This is being different than redundant information if that only says
repeat occurances, is it not ?
I would think of it working like....
say first of all none of the file for real mixed with tokens, but the
idea like this....
put all of the file in a geometric area, where parts are further or
closer apart.
keep putting it in the geometric area where like if "had been here" is
already in there, it might be broken apart as words or together maybe?
but now putting "here already" in is what, so put the word "already"
like near the word "here".
so now for "had been here" and "here already" you only keep "had been
here already", because the word "here" already found but not like a
repeat occurance, but like a pattern to find another way.
so it should be like "gunsmith", "muts", "record", "buns", "thrill"
has it so there>s maybe in geometric area
g uns mi th muts record b rill
and then for those words a token that has a plot coordinate map said
shorter, like just a curved line to connect the parts in an ordering.
see how this can achieve better? it has no limit the same as finding
repeat occurances this way.- Hide quoted text -
- Show quoted text -
I think this idea could really go over well...
It doesn>t seem to follow the same thinking as how random data is hard
to compress with how repeat occurances won>t frequent enough to call
it any benefit....
seems like random can mostly have small strings rearranged as common
enough... to call that stored information once though and a token each
time...
and i think even once compressed in what you say is a geometric area
and tokens, you can find that to even be patterns like you can say
again.- Hide quoted text -
- Show quoted text -
[/quote]
The proof that random data is compressable...
BLOCK_BEFORE ... "abcdefghijklmnop" ... "opmnklijghefcdab" ...
BLOCK_AFTER
stored as....
BLOCK_BEFORE, "ab", "cd", "ef", gh", "ij", "kl", "mn", "op",
BLOCK_AFTER
so then a curved line connecting BLOCK_BEFORE - "ab" - "cd" - "ef" -
"gh" - "ij" - "kl" - "mn" - "op" - "op" - "mn" - "kl" - "ij" - "gh" -
ef" - "cd" - "ab" - BLOCK_AFTER
so....
stored with seperation of each block.
To say one block after another the way it is to start somewhere and
work around a centerpoint of plot points
to be how only block seperation is to keep and for plot points to be
how a spiral or curve can always connect any points together.
so...
total size now... each block, as seperated, and a curved line.
16 bytes reduced, 10 blocks seperated instead of 1, and a curved line
more complex.
so it>s to say for an arbitrary amount more data, any 2 bytes as
found
to be "ab", "cd", "ef", "gh", "ij", or "kl" is for one block
seperation for the block found before, and a curved line made more
complicated.
that should be about breaking even right?
it would be even better to find something like "abijkl" like parts
already for another pattern, because now that>s 6 bytes reduced, 1
more block seperation,
and a stretched curved line.
so find "uiopijqref" found after to be only "uiop", and "qr" blocks as
new, for example, and a stretched curved line.
I mean like...
in a 3d space plot points, and a spiral zig-zag curve line that
connects each plot in the order to arrange the pattern.
see how the idea of a pigeonhole problem isn>t even there? because the
data blocks to organize together are kept seperate from the token
area. |
|
| |
|
Back to top |
Thomas Richter Guest
|
Posted: Fri Jul 25, 2008 2:05 pm Post subject: Re: compression type |
|
|
mcjason schrieb:
[quote]
Second point above. Please state what "random" means. You haven>t done
so yet. Please do your homework - it>s really about helping you, not
about annoying you. Nobody can do that for you, you must learn it yourself.
data where the trend tends to be few repeat occurances of a length of
data, where it>s usually not a worthwhile tradeoff to say one
occurance of what repeats, for there to be a token, for how tokens
have a limited way of being said for what else is said. Beause in
random data the allocation space for a token is usually too exhausted
for
there to be a worthwhile way of saying what a token is for what else
is said, for how a repeat occurance of a length of data can be said
once with a token otherwise.
[/quote]
Not a very reasonable definition, but for the time being, let>s take
this. According to this definition, the following string
1234567891012131415161718191202122232425262728293031323334353637383940...
is random, (nothing repeats, provably) though still a ten-year old can
see its construction algorithm.
Hint: You seem to believe that "random" is an attribute that you can a
apply to a sequence you can point at. "Random" is the property of a
process, not of a specific string in particular. Depending on the
process, the string
1111111111111111111111111111111111111111111111111111111111111....
is as likely as the above.
[quote]I understand perfectly why this can be seen as a problem when it comes
to compressing with the technique of saying what repeats once with a
token for other occurances.
[/quote]
I>m not saying this. *You* say this.
[quote]It>s intuititive to think of this the way
the problem is well described. But I can>t find anywhere the say so of
random being hard to compress isn>t connected with the idea of only
working the way that repeat occurances are made fewer, with tokens
taking a naming allocation.
It>s very limited to think that>s the only way to compress, I gave A
PERFECT analagy of how this is VERY WRONG.
[/quote]
*Sigh* You gave a non-working example. What makes you believe that I
think in "patterns"? I don>t. My field is *image compression*, yet you
can compress them even though there are no patterns, and the algorithms
used there do not look for matched patterns. Hence, please do not try to
tell me what I do and do not know - I think it>s the time for you to
deepen your research.
[quote]it>s to say this proves how random is compressable, take it whatever
way you want I know it>s right.
[/quote]
Using a definition of "random" that makes sense (your definition
doesn>t, I wouldn>t call either of the strings random), you cannot
compress random strings.
[quote]say for every length of data there can be a shape, a shape where it>s
a shape different for everyway the data is different.
given perfect math it would be a shape the same size as the data,
because of that making a different shape for everyway data is
different.
[/quote]
That>s a "data model"; the question is "is this data model" reasonable
to compress data? And the answer is: For every model one can construct
data that cannot be successfully modeled by it (IOW, cannot be
compressed, using an optimal entropy coding algorithm on the output of
the model). In your case, the model would be to draw shapes or curves or
spheres. As long as you don>t give better arguments as why you believe
the model you have is good, and for which type of data it is good for,
this is a lost attempt.
What you don>t seem to realize is that while it is fairly true that more
complex models can describe more complex data, these models *also*
require more modeling parameters you somehow have to encode as part of
the message. It is a trade-off between simplicity of the model against
the size of the model parameters. Choosing a simple pattern repetition
model (as in LZ77) leaves only few model parameters (length and offset),
but it is only sufficient to match patterns exactly (from the past) and
not to describe sequences with a more complicated construction algorithm
(as the one I gave above). You can surely introduces models that do that
better, but then you also need more parameters.
In the end, you>ll never have an algorithm that "perfectly compresses
everything" because even though your model is then very complete, it is
so complicated that you need to transmit too much data just to describe
it. You *cannot* win this game, it>s a logical constraint about maps
between finite sets, a very elementary one.
[quote]now say for two lengths of data, a shape for each.
now.. this might be a little harder to believe is right.
[/quote]
I>m not arguing at this level - you don>t seem to understand.
[quote]given a shape, and another shape, there is math to say the shape but
made different, to the other shape, where the math to say one shape
different to the other shape is smaller than the other shape. So
instead of saying two shapes, say one shape and the math to make the
shape different as the other shape.
[/quote]
All very well, but you still need data to describe this "different", and
you>ll soon find out (once you would dare to try to implement it) that
the overall byte budget required to describe this "different" is higher
than the byte budget you save by using this model, at least for *most* data.
If you don>t believe this, I urge you to implement your idea in an
algorithm and observe this yourself. Depending on the data set, the most
successful models are simple.
[quote]given a perfect idea of how this would work, shouldn>t it be that the
math has a 50% rightful claim of being smaller than the other shape,
and a 50% rightful claim of being bigger than the other shape?
Shouldn>t it though just to think of the most idea condition there
should be?
doesn>t that make sense when there could be some math smaller to say
one shape made to be changed is another shape, smaller than the other
shape? and some math bigger than the other shape? shouldn>t the idea
round off as a 50/50 of smaller and bigger than the other shape? to
say a shape changed is another shape.
[/quote]
It all makes sense to say so, but your algorithm also has to say so,
namely has to communicate this to the decoder. And *that* is where your
problem is.
Again, if you don>t believe me, construct this algorithm and you>ll see
yourself.
So long,
Thomas |
|
| |
|
Back to top |
stan Guest
|
Posted: Sat Jul 26, 2008 7:01 am Post subject: Re: compression type |
|
|
mcjason wrote:
[quote]On Jul 23, 1:37 pm, Jim Leonard <MobyGa...@gmail.com> wrote:
On Jul 23, 12:14 am, mcjason <mcja...@gmail.com> wrote:
and i bet though that what>s compressed is able to be compressed again
the same way.. i mean, there should be any reason why there isn>t
patterns to find like this in how you say curved lines and sphere
areas...
We>re done here. I think your next course of action is to stop
ranting and program an LZ77 compressor so that you gain actual
experience writing a compressor. Here>s a few links to help you get
started:
http://datacompression.dogma.net/index.php?title=FAQ:Intro_to_Data_Co...
http://www.fadden.com/techmisc/hdc/index.htm
Read these completely, and if you don>t understand the LZ77 portions,
find a different hobby.
I can put everything you said into a smaller program and make it run
to say the same if it were trying to be the simplest way to program a
rejection letter servant with no manners and takes only a keyword as a
hint, and it would also serve the purpose of answering any post that
tries to be better than there isn>t to think about.
does that compress you? I found it saying alot more than one thing.
[/quote]
You don>t seem to believe conventional thinking is correct and that you
have a totally new idea.
Can you actually code this up? Or maybe show some pseudo code
explanation? Are current computers capable of executing your idea?
Failing any of that can you show a specific original file and a complete
compressed file? You have repeatedly mentioned uncompressed examples and
shown some reorganizations but you havent shown how to represent the
geometry that is needed to rebuild the original text. You method of
compression can repoduce the original information I hope.
How about showing:
1. An uncompressed string.
2. The complete results of applying your idea to 1.
The reorganized parts and the geometry required to uncompress
the reorganized parts.( The information needed to map the
reorganized parts back to the original data. )
Maybe I>m not clearly seeing your idea. You>re basically saying that in
your head, even random data can be compressed. Can you make the idea
concrete?
See in my head I can imagine that perpetual motion works, gas is cheap,
and teenagers aren>t the least bit annoying. Then I wake up and, oh
well. |
|
| |
|
Back to top |
mcjason Guest
|
Posted: Mon Jul 28, 2008 9:45 am Post subject: Re: compression type | |