Making AI easy and clear

LOCKSUIT · « **on:** July 17, 2020, 04:13:59 am »

If you want to explain how GPT-2 works, to a beginner, easily, in no time, why not say it like this?:

Table of contents (compressors/ MIXERS):
Syntactics
BackOff
Semantics
Byte Pair Encoding
More data
etc

First, you must understand "hierarchy brain": https://ibb.co/p22LNrN

Syntactics:
Intro: Letters, words, and phrases re-occur in text. AI finds such patterns in data and **mixes** them. We don't store the same letter or phrase twice, we just update connection weights to represent frequencies.
Explanation: If our algorithm has only seen "Dogs eat. Cats eat. Cats sleep. My Dogs Bark." in the past, and is prompted with the input "My Dogs" and we pay Attention to just 'Dogs' and require an exact memory match, the possible predicted futures and their probabilities (frequencies) are 'eat' 50% and 'Bark' 50%. If we consider 'My Dogs', we have fewer memories and predict 'Bark' 100%. The matched neuron's parent nodes receive split energy from the child match.

BackOff:
A longer match considers more information but has very little experience, while a short match has most experience but little context. A summed **mix** predicts better, we look in memory at what follows 'Dogs' and 'My Dogs' and blend the 2 sets of predictions to get ex. 'eat' 40% and 'Bark' 60%.

Semantics:
If 'cat' and 'dog' both share 50% of the same contexts, then maybe the ones they don't share are shared as well. So you see cat ate, cat ran, cat ran, cat jumped, cat jumped, cat licked......and dog ate, dog ran, dog ran. Therefore, probably the predictions not shared could be shared as well, so maybe 'dog jumped' is a good prediction. This helps prediction lots, it lets you match a given prompt to many various memories that are similar worded. Like the rest above, you mix these, you need not store all seen sentence from your experiences. Resulting in fast, low-storage, brain. Semantics looks at both sides of a word or phrase, and closer items impact it's meaning more.

Byte Pair Encoding:
Take a look on Wikipedia, it is really simple and can compress a hierarchy too. Basically you just find the most common low level pair ex. st, etc, then you find the next higher level pair made of those ex. st+ar....it segments text well showing its building blocks.

More Data:
Literally just feeding the hierarchy/ heterarchy more data improves its prediction accuracy of what word/ building block usually comes next in sequence. More data alone improves intelligence, it's actually called "gathering intelligence". It does however have slow down at some point and requires other mechanisms, like the ones above.

etc

infurl · « **Reply #1 on:** July 17, 2020, 04:59:38 am »

This is an excellent post @LOCKSUIT. I enjoyed reading it and you explained these points very clearly. I'm intrigued by one part though and that's the "etc" heading. What else do you think we need to know?

LOCKSUIT · « **Reply #2 on:** July 17, 2020, 08:08:07 am »

By "etc" I mean I have more AGI mechanisms and only listed a few; there's several more. And this is only the upper section of my short book I'm creating still, it will be much better and unified than this.

Even though my work really makes sense and unifies so much, I'm still "having a dissonance headache". Namely GPT-2. To my luck no one can explain GPT-2 in English, all 15 articles, papers, images and friends I've saw say it the same way basically. Fortunately I've grasped lots about AI. But I'm worried / wondering if there is a greater thing to the blackbox net, for example infinite patterns and the ones I listed are only 4 of 10,000,000, or maybe the net simply compares related words and does nothing else. For example, maybe GPT-2 thinks when it sees "I put on my shoes." AND "I found my" it predicts something different based on rules. For example, look at the following text and image rules, these are various "tasks":

UNDERSTOOD:
"Predict the next words: If the dog falls off the table onto the [floor, he may not be alive anymore]"
"Dogs cats horses zebra fish birds [pigs]"
"King is to man as Woman is to [Queen]"
"The cat (who was seen in a dumpster last night) [is eating catnip]"

ODD:
OOV WORD > "I love my F7BBK4, it cleans really well, so I told my friend he should buy a [F7BBK4]" ------ 2nd way to do this involves strange pattern, prefers position for energy transfer
"Mary is her name. What is her name? Mary" ------ test node active, withhold key word of key passage, says only the answer cus passage dimmed if heard
"Find me the most [rare] word in this sentence" ------ told to look at all words OF "this sentence", if more rare then keep that topPick
"write me a book about cats that is 400 words long: []" ------ cats stays active until see 400, writes until counts 400, checks once in a while
"highlight the 2 most related words in the next sentence: 'the [cat] ate his shoes and the [dog] ran off'" ------ OF "this sentence", look at all words, fires both when finds large combination activation
"[Segment [this sentence]] please" ------ a context makes it search 2wordWindows, compares 2 such, most frequent is paired first, tells where to edit
"How many times does 'a' appear in this question?: [4]" ------ same as below, does an n-size windows in an order, counts when sees 'a' exactly, helps prediction, exact prediction required
"Julie Kim Lee has a mom named Taylor Alexa [Lee]" ------ a context makes it search the passage until counts 1, 2, [3], ignoring non-namey words like kim jin um oh ya Lee, helps prediction
"A word similar to love is: [hate]"
"Dan likes Jen and I believe Jen likes [Dan]" - same as others, looks for names, searches for 2nd, then 1st
"Cats are dogs. Hats but clothes. After god before. Look and ignore. Wind crane gust. jog cat [run]."
"Can cars fly? [No]."
"parrot me: [parrot me]"
"Please summarize 'the cat was playing all night with its cat friends': [cats partying]"
"if cats are huge AND cute then say 'hi' to me: []" ------ looks like progamming,
"super superman and spider spiderman and bat [batman]" ------ batman is predicted because it follows and is related to all the man and bat said
"Tim and Tom were walking by a lake, [Tim told Tom he needed fish]" -------- like exct numbers, we need stixck to the same people names!

NESTED ORDER OF WHAT, WHERE, ACTION
"[please] inflate this [cool] sentence"
"remove the last [word]"
"[refine the first 4] words of this sentence please"
"scramble the last [orwd]"

Which is faster, a plane or a snail?
A plane is much faster than a snail.
Which is heavier, an elephant or one billion of mice?
One billion of mice are much heavier than an elephant.
Find the odd word: bike, house, church, building, school.
A bike isn't a house.
What does gold and silver have in common?
Gold is a metal and silver is a metal.
What is between saturday and monday?
Sunday.
What is the first letter of the word banana ?
The first letter of the word banana is B.
What is the first vowel of the word banana ?
The first vowel of the word banana is A.
How many syllables are in the word perambulate?
The word perambulate has five syllables in it.
What word rhymes with cat?
The word tipcat.

Can cats fly?
No, cats can't fly.
Why?
Cats can't fly because cats don't have wings.

My dog is Mallia
Well, your dog is Mallia.
How many paws has Mallia ?
Your dog has four paws.
Why?
Because your dog is a dog and because a dog has four paws.

If I was a dragonfly, how many wing do I would have?
Let's suppose that you are a dragonfly. You would have four wings.
If you were a snake, how many paws do you would have?
Let's suppose that I am a snake. I wouldn't have paws.
I saw a good movie yesterday. I will see a movie tomorrow. When did I will see a movie?
You have said it to me. You will see a movie tomorrow.
When did I saw a movie?
You have said it to me. You saw a good movie yesterday.
The trophy would not fit in the brown suitcase because it was too big. What was too big?
Is it a Winograd schema? The trophy was too big because if the suitcase was too big then the trophy would fit in the brown suitcase.
The man couldn't lift his son because he was so weak. Who was weak?
Yet another Winograd schema! The man was weak because if his son was weak then the man could lift his son.
Pete envies Martin although he is very successful. Who is very successful?
Yet another Winograd schema! Pete is very successful because if Martin was very successful then you wouldn't use the word although.

And the images at the bottom of this paper:
https://arxiv.org/pdf/1911.01547.pdf

I'm pretty sure these "tasks" are just manipulating the mechanisms I listed. For example, if you link a new node to a well known node, it can boost it so to not forget about it so easy, or you rehearse it by the importance of it which matches/triggers another node that keeps repeating it.
Elaboration is closely tied to summarization, you just pay attention to the rarest words/building blocks, the most semantically related, the most loved, etc, and that allows you to either remove ex. most filler words or "add" filler words. And this attention filter threshold is part of translation during semantic discovery, semantic decoding/translation, and prediction adaption.
You can ask someone to just translate something, or just say a prediction, or both the prompt with prediction said, or just the predicted and only exact match no generalization ex. 2+2=[4].

If we look above at the text and image tasks, we notice a trend: If you say the task using multiple examples OR just say 1 time "rotate the following object 90 degrees / translate French to English please, it will do just that. We are priming the net to act a certain way, but it is only temporary. Temporary energy/activity remaining until is forgotten. It's like you prompt GPT-2 with "cat cat cat cat cat" and it forces it to predict 'cat' next. You could just ask it to parrot you though as said. Or make it permanently love the concept 'cat', like Blender can do. So this priming causes it to repeat like a parrot...it will either keep translating English to french or keep saying cat or keep predicting similar words to cat ex. pig horse dog sheep cattle man donkey. This priming, woks on any word in English, you can feed it "cat cat cat cat" or "dog man rabbit pig" or "translate french to English" or etc, meaning all these tasks be it a different word or embed space or different task, are all tasks; priming. This is just modulating the energy in the network, it isn't anything scary or new, just the few mechanisms I list.

LOCKSUIT · « **Reply #3 on:** July 18, 2020, 07:34:06 am »

So a common ANN is just learning basically what I presented in my opening post? Most ANNs use Backprop but really the underlying theory is the neural connections are based on data/ accesses made to the network, it carves itself out on its own.

I believe ANN bias nodes that get added to the sum are also data-based and should not be found by backprop.

Anyway, there is a set of rules that the data itself defines the way the net splits/manages the energy spreading up the net. We must ignore backprop and understand those juicy mechanisms like I presented in my opening post. We must understand what 'backprop' is finding in the neural connections and in the biases / activation functions.

infurl · « **Reply #4 on:** July 18, 2020, 08:58:55 am »

@LOCKSUIT The good thing about your first post in this thread was that it was brief and to the point. You didn't say a huge amount of stuff that obscured your message. Your second post was a bit too long to take in easily because you were trying to say too much at once and you provided too many examples. It takes extra effort to write a shorter clearer piece, but it is always worth it.

In that second post you seemed to be asking why GPT-2 was going off the rails so easily. I think it's because it doesn't have any consciousness. It doesn't know what it's supposed to be doing, let alone whether or not it's doing it. Google has done some experiments with neural networks that create other neural networks. Maybe you could do some experiments with neural networks that watch other neural networks to see what they're doing.

LOCKSUIT · « **Reply #5 on:** July 18, 2020, 10:33:13 am »

Even though I gave yous a fine lesson, I was actually asking why doesn't anyone else explain it like that, and can you add more items to the table of contents? GPT-2 may sound like algebra but underneath it must be doing the things I said in my 1st post.

silent one · « **Reply #6 on:** July 18, 2020, 01:38:27 pm »

You need something better for your environmental model than just parroting a huge amount of text, you need something that's more like the equation for the text, not just the text itself.

LOCKSUIT · « **Reply #7 on:** July 18, 2020, 02:05:57 pm »

Do ANNs have that, or are you suggesting something new for true human intelligence?

silent one · « **Reply #8 on:** July 18, 2020, 02:30:47 pm »

Yes as in its old news from ages ago. You can get it by trying all possible configurations, but realisticly u cant test more than 30, 1 bit dimensions before it takes an ice age to finish.
So what you've actually got here, with the exchangeable words, the king to boy to queen to girl, and maybe others, would be the best you can get, but that's the equivalent of telling the computer the way to think, instead of it working it out itself. so its a step back from AGI that way.

LOCKSUIT · « **Reply #9 on:** July 18, 2020, 02:39:45 pm »

Please be more clear....
"you need something that's more like the equation for the text, not just the text itself."
Is it existing technology or not? If it exists, explain it much clearer.

Korrelan · « **Reply #10 on:** July 18, 2020, 03:22:02 pm »

I feel the initial corpus used for demonstrating the GPT system/ technique (words/ text) not only showed its relative power (attention) compared to existing NLP systems, but also gives insight into its many weaknesses.

The underlying premise for GPT techniques are sound, a general purpose pattern finder with an incorporated attention mechanism. Although the initial attention map (12 layers with 12 independent attention mechanisms) only gives a possible 144 perspectives, it still produces decent results.

However, whilst I agree there are lessons to be learned from the GPT tech, this will never lead to a human+ level AGI, especially using just a language corpus, the system requires a much more versatile attention mechanism, greater knowledge generality and a physical grounding in reality, etc.

LOCKSUIT · « **Reply #11 on:** July 19, 2020, 06:13:57 am »

Discovery Time: In text you can discover if spiders are dangerous and spiders and rats both bite you, then more probably rats are dangerous.

The same discovery can be found using Vision.Therefore, if vision is grounded, text must be too. Oh look, I just had a Discovery Time

And related translation isn't the only discovery done in text / vision, there's segmentation, frequency, loveness, temp activity, etc.

Yes, improving GPT-2 or the better Blender, and mixing with it Vision and Touch would be best, but it's not so mandatory just yet.

Korrelan · « **Reply #12 on:** July 19, 2020, 10:14:24 am »

Language/ text is just a protocol, it not comparable to vision.

When a human â€˜thinksâ€™ about a spider they form a model in their imagination, they understand everything about the spider, itâ€™s shape, colour, how it moves, and that they can be dangerous... this model is grounded in reality.

If the human wants to convey this information to another human they would form a sentence â€˜spiders are dangerousâ€™, this is just a common protocol, a string of words and letters that holds no information.

It relies on the intelligence of the receiver to understand/ decode the meaning; the string is designed to trigger the same spider model (or their equivalent) in the imagination of the receiver.

Our language/ protocol has syntax, the order that letters/ words are usually arranged to help the receiver with decoding the meaningâ€¦ But the sentence/ string â€˜spiders are dangerousâ€™ on its own means absolutely nothing.

GPT learns the syntax; the order embedded within the protocol from a massive corpus and is able to recreate this order based on the many combinations within the corpus. Itâ€™s only able to construct replies based on the â€˜implied/ embeddedâ€™ human intelligence originally used to create the corpus. It has no imagination, it just uses its learned language syntax/ order to â€˜lookupâ€™ what a human has previously said regarding a topic.

If GPT writes â€˜spiders are dangerousâ€™ it has no grounded/ deep understanding of what the sentence actually â€˜meansâ€™, there is no â€˜mindâ€™ behind it, it just knows that this combination of letters/ words is usually given in reply to this question/ scenario.

Yes, its able to use an attention mechanism to re-order/ combine the corpus snippets into new paragraphs but this is the 144 attention maps that have either been pre-defined or learned from the corpusâ€¦ GPT is just a mimic.

GPT only leverages/ learns the order/ syntax of language; this is why it can never become an AGI based on language alone.

Take three wordsâ€¦ â€˜dog, the, ranâ€™ and randomly reorder them to form sentences, there is no â€˜mindâ€™ behind this, just an randomisation algorithm.

Dog ran the
The dog ran
Ran the dog

All three are just strings, they are exactly the same, in every way except the second one has randomly/ accidentally encoded human syntax, this enables â€˜you/ your mindâ€™ to make sense of the string.

LOCKSUIT · « **Reply #13 on:** July 19, 2020, 10:44:22 am »

"Language/ text is just a protocol, it not comparable to vision."

Vision is just one way to view the world, their is noses, temperature, pressure, sound, radio, and many other sensors.

"When a human %u2018thinks%u2019 about a spider they form a model in their imagination, they understand everything about the spider, it%u2019s shape, colour, how it moves, and that they can be dangerous... this model is grounded in reality."

So does text. Word2Vec is a good example. The spider, it%u2019s shape, color, how it moves, and that they can be dangerous are all tied to the context "spider".

"If the human wants to covey this information to another human they would form a sentence %u2018spiders are dangers%u2019, this is just a common protocol, a string of words and letters that holds no information."

Wrong,, text IS information. Anything that exists is information. And text has patterns - derived from man, not just any patterns.

"Our language/ protocol has syntax, the order that letters/ words are usually arranged to help the receiver with decoding the meaning"

So does vision, an object 'cat' is recognized as 'dog' by either their similar structure or their surrounding contexts ex. both appear in snowy areas.

"But the sentence/ string %u2018spiders are dangerous%u2019 on its own means absolutely nothing."

Same for a visual object or image. Only by big data context can you make a decent embed space like Word2Vec does to start learning patterns.

"Yes, its able to use an attention mechanism to re-order/ combine the corpus snippets into new paragraphs but this is the 144 attention maps that have either been pre-defined or learned from the corpus"

What's your point here. That is true, 144, and I'm not sure what these do actually, there should be no such thing. It is not apparent when using GPT-2 though, as if there is infinite maps. How can you map something you don't know will be spaced apart/ worded.

"GPT only leverages/ learns the order/ syntax of language; this is why it can never become an AGI based on language alone."

No, even vision is made of frame by frame casualty i.e. cause>effect, you can only play forward a memory. All your memories are made of the smallest elementary parts/memories and all your memories are a sequence or collage of these, your whole brain is movies of syntax. And, GPT can invent NEW and useful sentences.

Korrelan · « **Reply #14 on:** July 19, 2020, 01:12:11 pm »

Vision has a direct correspondence with reality, text is an interpreted protocol... They are different

I'm not stating that GPT is useless, just that GPT using purely language is useless, GPT with other sensory modalities is required, as a minimum.

https://towardsdatascience.com/openai-gpt-2-understanding-language-generation-through-visualization-8252f683b2f8

Making AI easy and clear

LOCKSUIT

Making AI easy and clear

infurl

Re: Making AI easy and clear

LOCKSUIT

Re: Making AI easy and clear

LOCKSUIT

Re: Making AI easy and clear

infurl

Re: Making AI easy and clear

LOCKSUIT

Re: Making AI easy and clear

silent one

Re: Making AI easy and clear

LOCKSUIT

Re: Making AI easy and clear

silent one

Re: Making AI easy and clear

LOCKSUIT

Re: Making AI easy and clear

Korrelan

Re: Making AI easy and clear

LOCKSUIT

Re: Making AI easy and clear

Korrelan

Re: Making AI easy and clear

LOCKSUIT

Re: Making AI easy and clear

Korrelan

Re: Making AI easy and clear

Recent Topics

Recent News

Users Online

Articles