Are those actual compression results or just speculations?
Heres some speculations from me to u ->
If it was somehow true that you developed an environmental model off a huge amount of text information, via copying it, it actually hasnt generated anything new from it, but from my theory it is exactly what was put into it. (just like the world is exactly what was put into us, if... we were just copying it.)
You have to read this paper ->
https://worldmodels.github.io/ (thats actually the graphics generating out of its system running in web java, its the best thing on the internet for me.) This goes a step further than just copying I think, and its actually converting the information into something else.
If you want to do that, you need to have another policy, other than just "copy the data." Which is a policy of "I must be 100% truthful to it." Thats what this compression system is doing. Do you remember Marcus Hutter talk about Ockams razor? This is another policy of "I must be the MAXIMUM compression" and this is another attempt at trying to form the data, in a form where it becomes a procedural generation of the data, instead of a direct frame spitout. (which logically we think must be larger, than the procedural generation.)
If your doing the compression without that in mind, your doing back propagation instead of evolution search or gradient descent, it seems the same, but I dont think it is as creative a learning system!
Im not doing that anymore, I think that if you need another kind of policy, and I think it needs to be a goal based system, where the robot wants to do something inside the world, and the idea goes, the model doesnt just copy, and it transmutes it to a form which is more useful to the robot, AS it is COPYING, to stay 100% truthful to the laws and behaviour around it.
Theres a big problem already stumping me, and its the fact this goal code, has to B hand written interpretable stuff, where it has to involve only basic word and symbol type knowledge because thats all we have detectors for! Theres no such thing as a lie detector, theres no such thing as a "hurt someones feelings" detector, without being a quite shallow and bloated version hit and missy of what it actually is to have a real detector for it, which is impossible to write.
But think, if it grew some kind of facility around this primitive goal which transmuted the data into instrumental goals and variable and symbol conversions you kinda fantasize about and it really worked like magic and could actually communicate and act, if you asked it to "please go cure cancer for us?" Then how are u supposed to drive the robot to do it - with only basic detectors, if the whole shabbam worked it would be hit and miss if it even felt like doing it. ITS ONLY IF THE ROBOT FELT LIKE IT!!! so big probs there.
As in, you used some certain clever policies, but what the robot finishes up at YOUVE GOT NO IDEA ABOUT!!!