OpenAI hide n seek - my new reasonable understanding

  • 16 Replies
  • 515 Views
*

LOCKSUIT

  • Emerged from nothing
  • Trusty Member
  • *****************
  • Sentinel
  • *
  • 3548
  • First it wiggles, then it is rewarded.
OpenAI hide n seek - my new reasonable understanding
« on: November 18, 2019, 08:25:07 AM »
Ok i think i got it finally!....... below is my last note and below that is my final note from tonight. That clarifies it was more random trying that actual intelligence lol. It makes complete total sense! Or, prove me wrong.


Prior understanding:
Who here can explain the following to a 5 year old kid, or an old mom, in clear English?:
https://openai.com/blog/emergent-tool-use/
I.e. how well do you understand it? ; How many people can you recruit into AGI?
How well can you summarize it (in 100 words. And 200 words. And 500 words.) without any filler content and only main points?
Can you make it seem boring, like cake?
Who here can draw an intuitive viz so mom can understand it?
&
Here's my go at it.
1) There is a simulated world with 2 teams.
2) Looking at their Paper, each little man has the choice to move forward/backward, rotate, grab a nearby object, and lock an object in place.
3) They seem to start off randomly jiggling around/etc, but learn tricks, general tricks that work in diverse environments. For example, they seem to learn that usually holding an object is better than not, or if a teammate is nearby it is more likely to win, or if it makes non-sharp turns around walls it will run faster. They will decide when to use/combine these, and do so usually, based on when they see the recognizable cues.
4) The red team learns to move towards the blue team members because one of its jiggling moves got it in its line of sight. So now the red team found general tricks to find the blue team and wins the game goal criteria. Until now, there was only random jiggling, but then it won a game and found what, and when, will work, for diverse environments.
5) The red team cannot further optimize, but the blue team can. The blue team learns to use cubes to trap themselves in a room by random jiggles. This can be used in different environments, and the learning  of this technique could-have stemmed from the desire to usually carry boxes around.
6) This competition continues until it can no long continue. Total/global-optimization of evolution.
7) At the last stage, when OpenAI moves a ramp to the back of the room, the blue hider learns to prepare his friend's box and then goes to get his box when the other arrives with the ramp, because the time limit before the red enter the room is so short. This, without a sentence/plan generator that reasons its plans, must be the result of a brute-force algorithm as mentioned. They do use past tactics to learn new ones, which generalize to new environments, but this was clearly learnt from randomly jiggling around (while doing learnt tactics, so not completely random jiggling). Btw it's a small environment really, not many objects or time for long game plans. And multiple plans result in the same win/solution.
8) So it seems that they learn by randomly trying actions. These actions will work in diverse scenes, when they see similar cues/conditions. This behavior, topped with some random behavior, can result in deeper learnt behavior that wins deeper games.
9) They say they use algorithms GPT-2 uses. They look at objects instead of words, and decides what Next Action to write to the story plan. If it sees a scene it's seen, and knows what it did most frequently in previous won games, it will use that learnt action.


The Update:
Ok I got it, they run millions of rounds....at stage 1 they start random and musssst accidentally do the right action to find the hiders or run from seekers. By stage 2/3 they use boxes by accident, they must, there's no hints. The only thing they could have learned so far is that they won last game when their friend was nearby seen! Or make sharp corner turns. The ramp use, unless they re-used the other agents's model, same thing, it was a accident! The only search space hint was to be near their buddy.....boooo....anyway by the next stage the blues learned to take in their ramp, this, was accident too although they had hint supposedly i assume of running around with objects, near buddy, buddy does same behavior (actually he didn't, lol, he stayed at base), and to go back to base, and make sharp turns. By the last stage, the helping out of his friend in preparing his box, again, ACCIDENT! Why would he go up to it? Ok...to carry it, but it required a random behavior of doing so then dropping it, and that one worked, he didddnt know it would dude obviously! So it is a combo of raw simple RL, competition like GANs do, and some hints of what is what and what worked entailing that.
Emergent

*

LOCKSUIT

  • Emerged from nothing
  • Trusty Member
  • *****************
  • Sentinel
  • *
  • 3548
  • First it wiggles, then it is rewarded.
Re: OpenAI hide n seek - my new reasonable understanding
« Reply #1 on: November 18, 2019, 08:47:39 AM »
Let's not forget they trained on millions of diverse scenes, the agents could have learned what didn't work and what did work and diminish the 'error' maybe. So instead of learning what wins games by clearing the time limit fully, just clear it longer...
Emergent

*

goaty

  • Trusty Member
  • *******
  • Starship Trooper
  • *
  • 485
Re: OpenAI hide n seek - my new reasonable understanding
« Reply #2 on: November 18, 2019, 11:36:29 PM »
Yeh mate, you are right.
If you did 1 giant search it solves the model, just like with an unbiased chess search... but its not practical to do so,  so theres little tricks and optimizations there that lets them achieve the intractable. (and gives jobs to programmers.)

*

Hopefully Something

  • Trusty Member
  • ********
  • Replicant
  • *
  • 701
  • no seriously where are these cookies
Re: OpenAI hide n seek - my new reasonable understanding
« Reply #3 on: November 19, 2019, 02:26:53 AM »
Seems like it's not yet close to a minimum effort approach to finding workable actions/reactions. To make this approach better suited for the unsimulated world, they'd have to either increase the processing power, (to increase the number of possible tries in a given time),  or figure out a way to have the results themselves improve the efficiency of subsequent discovery making, reducing the number of necessary tries through the application of previous data to new situations.

*

goaty

  • Trusty Member
  • *******
  • Starship Trooper
  • *
  • 485
Re: OpenAI hide n seek - my new reasonable understanding
« Reply #4 on: November 19, 2019, 02:58:30 AM »
agree with the building of some kind of structure to make it take less work each time!
That would cause the singularity I think!

*

AndyGoode

  • Guest
Re: OpenAI hide n seek - my new reasonable understanding
« Reply #5 on: November 19, 2019, 04:15:46 AM »
Recently I've started to read up on GPT-2, which you highly tout. I was going to start a thread on assessing it with respect to AGI, but this thread might be a good launch into that.

It seems to me that you believe GPT-2 is AGI, but your interest in OpenAI hide n seek seems to contradict that. OpenAI hide n seek is a set of spatiotemporal problems, and I don't see how it could possibly done with words, which is all that GPT-2 can use, if I understand it correctly. Another spatial problem that GPT-2 would not be able to solve is the famous monkey and banana problem  (https://en.wikipedia.org/wiki/Monkey_and_banana_problem). For that matter, how could GPT-2 add two numbers, play a board game, solve a riddle, or perform any task that involves more than chains of words?
« Last Edit: November 19, 2019, 10:35:13 PM by AndyGoode »

*

LOCKSUIT

  • Emerged from nothing
  • Trusty Member
  • *****************
  • Sentinel
  • *
  • 3548
  • First it wiggles, then it is rewarded.
Re: OpenAI hide n seek - my new reasonable understanding
« Reply #6 on: November 19, 2019, 06:17:06 AM »
Yes! GPT-2 is AGI! Let's not hide it anymore I'm coming out and saying it! :)

The Hide n Seek they made is just random trial & error + competition recursion....they do use hints in such massive search space though it seems. They pay attention to, like gpt2, words - what objects/members they see. They then decide what most probable Good action they tried before that worked most often or got most reward, or is related to such one prior or delayed soon to happen. The attention is same as gpt2....! Same thing.

If we focus on text or vision (imagine a black slate in my crazy mind filled with my imaginations), this can do the motor control too. Words, or visions of lala, can both say or show action!!! Move, run, etc. Cat, push cat. Suns are hot. Etc. Simulating is better than real world motor testing ex. faster, safer, can increase tool size. So all you need is simulating, visual simulating. A visual GPT-2. GPT-2 can add 2 numbers but to do it perfectly like a calculator it must have favorite actions for 2+2. 4 is my favorite. Know why :)? To solve unseen problems for math I actually use my hand or visualize the carry over - i'll see [visually!!] "ok bro whats 65+92?" and focus heavily on the 65+92 and i will then focus on the last number then the others last number and then say verbally (no vision actually) '7' (sometimes i see it) an now in my scratch board vision space i see 7 and 150 and add them.....if its 0 it is 157.....

look at that beautiful adaption, it knows at 0:44 to put its hands through the holes, switch the rock to the other hand, and return hand back out to throw it after feeling it
https://www.youtube.com/watch?v=-KSryJXDpZo

Dude, DUDE, 1937 that is the most human like monkeys ever i seen ever holly molly folks, watch it all!!!:
https://www.youtube.com/watch?v=PnnSjdpoBVw
« Last Edit: November 19, 2019, 06:42:51 AM by LOCKSUIT »
Emergent

*

LOCKSUIT

  • Emerged from nothing
  • Trusty Member
  • *****************
  • Sentinel
  • *
  • 3548
  • First it wiggles, then it is rewarded.
Re: OpenAI hide n seek - my new reasonable understanding
« Reply #7 on: November 19, 2019, 06:41:38 AM »
The elephant also freeloaded, dd you see that? Wow! Though do be skeptical about elephant videos, trust me...
Emergent

*

AndyGoode

  • Guest
Re: OpenAI hide n seek - my new reasonable understanding
« Reply #8 on: November 19, 2019, 10:33:35 PM »
A visual GPT-2.

Now you're talking. That would definitely be impressive. There are still some very important things it couldn't do, but at least such a system would attract a lot of attention. So, are you going to try to code such a system, or what?

*

LOCKSUIT

  • Emerged from nothing
  • Trusty Member
  • *****************
  • Sentinel
  • *
  • 3548
  • First it wiggles, then it is rewarded.
Re: OpenAI hide n seek - my new reasonable understanding
« Reply #9 on: November 19, 2019, 10:39:08 PM »
If someone fills me in on the tidbits I lack about knowledge OpenAI knows that I on't, I can start popping out my own visual GPT-2 yes. While I know things they don't know, they know thing I don't, like how a GAN or CNN works and the exact way GPT-2 works. It's not the code I need. It's the visual.....you see? I need to know how many layers, random initiated net, what they do....it's not complicated really. I've already understood most the field's big terms in clear English. It's totally possible if you explain it properly in 1 day with no filler content, yeah 1 day it'd take. Good mentor.

Currently all I can do is preach my discoveries....that's almost as good. But I can put in more man hours than any of you I bet if I knew some things....
Emergent

*

AndyGoode

  • Guest
Re: OpenAI hide n seek - my new reasonable understanding
« Reply #10 on: November 20, 2019, 12:22:21 AM »
Here's a general chore that would be difficult for GPT-2 or any system: general induction (as opposed to mathematical induction). Induction means guessing a generalization from specific examples. For example, if the system saw the following pattern...

10110111011110111110111111...

...would it be able to describe the pattern? Would it be able to come up with a formula for the n-th digit?

*

LOCKSUIT

  • Emerged from nothing
  • Trusty Member
  • *****************
  • Sentinel
  • *
  • 3548
  • First it wiggles, then it is rewarded.
Re: OpenAI hide n seek - my new reasonable understanding
« Reply #11 on: November 20, 2019, 12:38:45 AM »
It can translate visually or text-ually what 1, 0, or 101, etc are....it may know 8 bits are letters......to me i look at that sequence of bits and i see the next word in the story has no space and it is either 1 or 0 and no other until the description time comes up..... i may ignore it, but if asked to, i will commit to answering the question until satisfied (how many words to add onto story? 1? 431?. So, I will add on 4 bits for you below. I could look at this as decoded text but i won't i'll look at this as bits, the weird idea of another topic we are proving gpt2 can solve. Here I go:

10110111011110111110111111

101101110111101111101111110
1011011101111011111011111101
10110111011110111110111111011
101101110111101111101111110111

So it was easier than I thought. What I did was recognized a meaning, sequence vector. It was visual. It was linear increase. I don't have functions like a sci calculator, I have only translation(recognition) and entailment.

Let me make one since I wasn't satisfied with your question:

101110100011101001010011111

1011101000111010010100111110
10111010001110100101001111100
101110100011101001010011111001
1011101000111010010100111110010

Here I added a fill in of prediction.
Below I do a pattern simulation using dummy numbers:

101110100011101001010011111
113113311211125

113113311211125113

1011101000111010010100111110
10111010001110100101001111101
101110100011101001010011111010
1011101000111010010100111110100
Emergent

*

goaty

  • Trusty Member
  • *******
  • Starship Trooper
  • *
  • 485
Re: OpenAI hide n seek - my new reasonable understanding
« Reply #12 on: November 20, 2019, 06:19:00 AM »
spacial scene geometry type words are body language, or sign language.

*

LOCKSUIT

  • Emerged from nothing
  • Trusty Member
  • *****************
  • Sentinel
  • *
  • 3548
  • First it wiggles, then it is rewarded.
Re: OpenAI hide n seek - my new reasonable understanding
« Reply #13 on: November 20, 2019, 07:36:28 AM »
maybe i'll get my understanding by videos and upwork hiring some freelancers, a good video is below:

https://www.youtube.com/watch?v=PKN_Cc-GyCY

btw, a tip, visual GPT-2 would look like that video. Yep. We imagine videos in our brains, simulating, daydreaming, predicting.
Emergent

*

LOCKSUIT

  • Emerged from nothing
  • Trusty Member
  • *****************
  • Sentinel
  • *
  • 3548
  • First it wiggles, then it is rewarded.
Re: OpenAI hide n seek - my new reasonable understanding
« Reply #14 on: November 21, 2019, 11:36:19 PM »
Ah, so the node activating for cat in the image activates if sees a ex. cat face upside down or right side up? It lights up for different inputs....many images trained on....this is why it is a representation! It's not a image of a cat stored, just a node that is tweaked just right to light up for a few dozen inputs that all actually look quite different.....

And then comes real episodic memories, humans also store images of their pet cat if they seen it enough, these are real episodic memory, not representation as much?

:)
Emergent

 


Users Online

9 Guests, 0 Users

Most Online Today: 23. Most Online Ever: 340 (March 26, 2019, 09:47:57 PM)

Articles