No not the film lol. The page I linked.
#1 Ok so they can move and pick which objects to LIKELY go to.
#2 They start off randomly moving. True to my name and my baby experiment.
#3 Aha they use Transformer ideas yes yes!!! Now all they need is transformer based simulated motors.
#4 Objects in its sight are payed attention too.
#5 Self-play is key in larger complex 'environments'.
#6 Long term evaluation in RL is hard.
#7 Many different tasks can be learned by RL supervision. And more transferable representations are needed.
#8 Self-emergent complex cooperation and competition using self play robotics language RL (THE COMPETITION SELF-PLAY IS A SELF-ATTENTION).
#9 As they say in the video, multi-agents play competition and cooperation survival thousands of rounds of self play against others and past selves in parallel occurred on Earth using Hide & Seek (uhm, ya, sounds more friendly put that way) and the champions (as shown in their video at the end) are the updates.
Wait, this is motors, simply they skipped hands/running like atoms and just simulate/learn bubbles at a higher level! What happens is winners duplicate faster and find food faster, losers don't, the winners face off past, current, and other domain creatures. The self play rounds allow it to learn higher level features, this works on its own. But. To pass 1 step in this Game Theory, it must learn, they start random, pay attention to objects/movements, they get reward for hiding/seeking, long term reward is needed still, and yes they have big trial space but they still learn but I bet in larger space complexity they will need language, and lastly the actual hints the agents use in this lower dimensional space to figure out these behavior items is um, ok so they start off randomly moving and they pay attention to objects/motions and instead of GPT-2 on words it does on objects/action however it does look like they track right-to the object and know where to bring it, as if they tried millions of runs (which they did actually). It's as if they are reusing behaviors they learnt already. Can someone explain their amazing behaviors? Why do they run to the correct object etc, its too perfect, as if the ran all possible outcomes, except that wasn't the point and probably not what happened.
It seems like all they do is turn and move forward/backward and grab object. So, all they must learn is sort-of where to go and to choose correct object when see it.....maybe too simple? That'd explain such wonderful behavior, because its such simple actions to learn (there's many correct ways to act).
So the takeway here (not already mentioned i mean) is that looking back/forth far enough is important, to skip lower layers and focus on high end task learning, use a self-attentive GAN R&D feedback competition self-play using many agents in parallel, and use Transformers FOR each step of THAT self play.