Ai Dreams Forum

Artificial Intelligence => General AI Discussion => Topic started by: LOCKSUIT on May 31, 2020, 10:58:37 pm

Title: GPT-3
Post by: LOCKSUIT on May 31, 2020, 10:58:37 pm
https://arxiv.org/pdf/2005.14165.pdf

175 BILLION PARAMETERS

GPT-2 full model had only 1.5B
Title: Re: GPT-3
Post by: LOCKSUIT on June 01, 2020, 12:40:22 am
A "Burringo" is a car with very fast acceleration. An example of a sentence that uses the
word Burringo is:

In our garage we have a Burringo that my father drives to work every day.


So I tried GPT-2 for fun and got:


A "Burringo" is a car with very fast acceleration. An example of a sentence that uses the
word Burringo is:


"She drove her Burringo fast."

Nouns that include "burlingo" must be treated as adverbs, not as nouns.


3. In non-North American English, the word "burrito" is used for a combination of meat and cheese and is different from the singular form of the word "burrito".

"A big burrito is coming down the street,"
Title: Re: GPT-3
Post by: LOCKSUIT on June 01, 2020, 12:48:09 am
I did really liked this new one thingy they has showed ( ͡° ͜ʖ ͡°):

Poor English input: I have tried to hit ball with bat, but my swing is has miss.
Good English output: I tried to hit the ball with the bat, but my swing missed.
Title: Re: GPT-3
Post by: Korrelan on June 01, 2020, 06:27:51 pm
https://www.zdnet.com/google-amp/article/openais-gigantic-gpt-3-hints-at-the-limits-of-language-models-for-ai/?__twitter_impression=true

 :)
Title: Re: GPT-3
Post by: yotamarker on June 01, 2020, 07:05:53 pm
how do I use it with java tho ?
Title: Re: GPT-3
Post by: LOCKSUIT on June 01, 2020, 08:53:12 pm
Ye Ye,

Quote
OpenAI's work on language has been part of the history of a steady progression of one kind of approach, with increasing success as the technology was made bigger and bigger and bigger.

The original GPT, and GPT-2, are both adaptations of what's known as a Transformer, an invention pioneered at Google in 2017.

I knew this for at least a few months. More data does increase intelligence, the dictionary literally defines information as "intelligence". Bigger systems can beat smaller versions of themselves; A device made of only 100 atoms that is the highest technology possible (made by aliens 99,999 years from now) still can't do much (unless it grows, at least).

However there's multiple ways to get "loads" of "free" [information] from, the same sized dataset. You need a better data insight extractor / pattern finder to get more virtual data. Data isn't random 1s & 0s. As well, then, now, throwing more non-virtual dataset size at it will also inflate the virtual data, so throwing x10 more data at it may give you 100x free data inside, and with a better extractor 10x the datset may feel like 10000000x more data inside instead of 100x. You won't necessary have to extract that much, certain information simply is so powerful. Predict what follows "cat cat cat cat _". You don't need a bigger dataset to find information here. Attention is drawn to active nodes.

Title: Re: GPT-3
Post by: ivan.moony on June 01, 2020, 09:10:20 pm
@L, Did you know that the same program may be programmed in 20 lines, and do the same thing as the one programmed in 1000 lines? Also that one of 20 lines may be many, many times faster than that one of 1000 lines. It is said that 20 lines one is an optimized version of 1000 lines one, and it does the same thing. Further, algorithms may be optimized either for speed, for size, or for both.
Title: Re: GPT-3
Post by: LOCKSUIT on June 01, 2020, 09:19:57 pm
Ivan, you're talking about a [learning] or [growing] algorithm/system. Even Brute Force is one.

The "actual" data or army force will Not fit in 20 lines of code lol. Trillions of images/ facts, or nanobots and Dyson spheres, are really really big things and don't fit in 20 lines of code. It takes time to attain that size.

The seed can be awfully small I guess, ya. But the time to grow takes at least some time! My point was when you have more data and a bigger army, you are much more capable.

While the most advanced DNA seed can be incredibly small and still grow into nanolord in a day, still, a too small seed will not be able to do much, but evolve much more slowly.

It's when the seed has grown will it show its pretty-ness :)

edit: yes i knew algoroirthms have a trade-off for time/memory/complexity. Ex. fast but big mem. Smaller code or RAM again makes it slower to learn/ grow.

Bigger cities get bigger faster. Big companies Pool/suck in cash etc lol.
Title: Re: GPT-3
Post by: Korrelan on June 01, 2020, 11:00:21 pm
What the authors are saying is that building a neural network that just predicts probabilities of the next word in any sentence or phrase may have its limits. Just making it ever-more-powerful and stuffing it with ever-more-text may not yield better results. That's a significant acknowledgement within a paper that is mostly celebrating the achievement of throwing more computing horsepower at a problem.

 :)
Title: Re: GPT-3
Post by: LOCKSUIT on June 02, 2020, 01:45:29 am
Korrelan, if adding more data didn't make a brain more intelligent, then I could literally stop researching how to build AGI tomorrow, because I wouldn't need any new information. Every day, my own brain's goal is to learn more, and in doing so I become smarter at: 1) finding more new data, 2) how to look at that data, and 3) deciding what to learn, and in doing so I come closer to being able to stumble on the paragraph explaining all details of how the brain works. I update my research domains everyday - specializing/ exploiting where I will explore next.

The problem with GPT-2 is many things and they all have the same trait: You can learn and learn and learn all the data you want, updating your weights to the point where you've seen eat entail dogs 88,100 times and run follow dogs 88,050 times, learning that dogs eat just a bit more than they run, but the search space is extremely huge and you'll almost never see the same sentence again as well because we don't use all the search space either, only quantized items in it. So instead of storing every phrase similar to "my cats eat food", I store only that one. At some point, adding more dataset doesn't improve the model, because basically you [can] [quickly] learn all the low level features like a, b, c, th, sh, .ed, ion, then at some point if you learn a larger feature - it is not shared as much and not as useful, smaller features are as powerful and are learnt more quickly, it's all you need, you don't need to store every possible 40 word phrase, only a model of physics! The issues with GPT-2/3 is that it doesn't know how to do that trick....fully. It does make a model, it does recognize "my cat ate" as "her dog ran" some amount (maybe bit less than could see, or too much), but it's not fully digging up the information hidden in the data, and so it's stuck with the ever growing Trie Tree mentioned, and therefore, being stuck closer to the root, sucks, and adding more data doesn't help the Trie Tree either lol.

What GPT-2 needs to do is see more information in the same sized dataset better/smarter. Do you understand korrelan what I mean? Does anyone here? Semantics let's you. You use cats when prompted with "dogs _?_" to help prediction, they share contexts/predictions. GPT-2, therefore, needs a LOT more data, so much that it's nearly impossible to give a number, and to do that requires tricks like Semantics, and many others no one talks about. Not storing a monster Trie Tree that has "seen all" 40 word sentences, many times over and over again.


@Ivan, and maybe, America can bounce back like that, like a seed, where people still know how to do what they do and still have the main machine-tools.
Title: Re: GPT-3
Post by: Don Patrick on June 02, 2020, 08:46:07 am
What the authors are saying is that building a neural network that just predicts probabilities of the next word in any sentence or phrase may have its limits. Just making it ever-more-powerful and stuffing it with ever-more-text may not yield better results. That's a significant acknowledgement within a paper that is mostly celebrating the achievement of throwing more computing horsepower at a problem.
Indeed. The reason I appreciate this paper more than previous efforts is that this time they've actually taken a critical look at it instead of celebrating wishful thinking. They acknowledge things like testing data having been in the training data. Even though they tried to compensate for it, it adds much needed salt to take earlier results with.
Title: Re: GPT-3
Post by: Korrelan on June 02, 2020, 09:34:33 am
Yup, and even OpenAI are starting to realise/ admit the futility of an ungrounded pure language system.

 :)
Title: Re: GPT-3
Post by: Don Patrick on June 02, 2020, 12:40:21 pm
Quote
From the paper:  "whereas ultimately, useful language systems (for example virtual assistants) might be better thought of as taking goal-directed actions rather than just making predictions."
This also sounds similar to discussions about forward and backward chaining in inference systems in the 80's. If you want to end up with a specific result, backward chaining is more fruitful, but if you want to explore all possible results, forward chaining does that trick. So it's like they have been trying to restrain a forward chaining algorithm to do the task of a backward-chaining algorithm. It's silly that this shortsightedness persists that there is only one single correct method.
Title: Re: GPT-3
Post by: LOCKSUIT on June 02, 2020, 02:37:19 pm
What the authors are saying is that building a neural network that just predicts probabilities of the next word in any sentence or phrase may have its limits. Just making it ever-more-powerful and stuffing it with ever-more-text may not yield better results. That's a significant acknowledgement within a paper that is mostly celebrating the achievement of throwing more computing horsepower at a problem.
Indeed. The reason I appreciate this paper more than previous efforts is that this time they've actually taken a critical look at it instead of celebrating wishful thinking. They acknowledge things like testing data having been in the training data. Even though they tried to compensate for it, it adds much needed salt to take earlier results with.

We've went through this already... I tested it thoroughly, giving it unique prompts and getting novel while amazing completions. It's not perfect but it's pretty stunning and creative and can combine lots of topics. The GPT-2 online, being the full model and/or perhaps his settings, seems to be not as good as it was when I did most my testing. Still, I got some good ones just now :)

https://www.youtube.com/watch?v=rO6JBBoAb3s

Title: Re: GPT-3
Post by: infurl on July 24, 2020, 10:52:42 pm
https://thenextweb.com/neural/2020/07/23/openais-new-gpt-3-language-explained-in-under-3-minutes-syndication/ (https://thenextweb.com/neural/2020/07/23/openais-new-gpt-3-language-explained-in-under-3-minutes-syndication/)

Here's a pretty good summary of what's happening with GPT-3
Title: Re: GPT-3
Post by: infurl on July 31, 2020, 04:12:22 am
https://www.theverge.com/21346343/gpt-3-explainer-openai-examples-errors-agi-potential (https://www.theverge.com/21346343/gpt-3-explainer-openai-examples-errors-agi-potential)

Here's another cogent article about GPT-3.

All those impressive demos that you're seeing? They're cherry-picked out of the numerous attempts that failed miserably. GPT-3 is no closer to anything that's actually useful than anything else that has been tried. However it did waste a lot more electricity than anything ever before.
Title: Re: GPT-3
Post by: LOCKSUIT on July 31, 2020, 04:38:30 am
I didn't realize until last week GPT actually looked at text in different attention views. It's really closer to AGI than I thought. I thought it was just highlighting which words were similar =) "[Jen Cages] has mom named [Beth Cages]". It certainly can cause it to repeat but below I show what else those windows do.

For example, instead of predicting the next word - by finding a similar match to the last ex. 10 words:
"and that [ape was sitting all alone in a cage and he] _"
it can predict the next word successfully by looking at just particular words only if the scenario only needs that ex.:
"[Jen Cages] is a girl who [has a mom] named [Beth] *Cages*"
Notice the [] are only needed on a few words....it works on many more sentences because of this.

It also seems the new GPT-3 is a tad more accurate being trained on more data of course and can be asked to do a specific task more commandingly ("on-demand fine-tuning")
Title: Re: GPT-3
Post by: Don Patrick on July 31, 2020, 09:47:59 am
I wonder if "predicting the next word" shouldn't be called "recalling the next word" from its training data, or where we can draw a line between merely swapping subjects and generalising. I think there is a difference, but they are similar in utility.

The other day someone posted a video of GPT3 producing two SQL queries, with no further context. I thought it was suspect that they only showed two, and after locating the original Tweet, found the user saying that it made a lot of mistakes, but they didn't show those. Cherry-picking indeed. That is not to say it might not be some degree of useful as an autofill, but it does mean users would have to double-check its code suggestions or risk glossing over mistakes they wouldn't have made themselves.
https://twitter.com/FaraazNishtar/status/1285934622891667457

At the end of the day, I don't look at the results but at the underlying mechanism to tell me its inherent limits. I don't think it's possible for purely associative processes to reliable reproduce rule-based systems like math or time or physics. It seems every GPT version is just memorising more data to obscure its inherent incapabilities, raising accuracy without addressing the root problems. The problem I have with that is that this will always leave some edge cases, no matter how rare, that go completely off board. It's the difference between an AI that recommends a cure that works in 95% of all cases, but kills the rest in spectacular fashion, or traditional doctors that recommend a cure that only works in 70% of all cases, but only causes mild inconvenience for the rest.
Title: Re: GPT-3
Post by: LOCKSUIT on July 31, 2020, 06:56:48 pm
By far it is not recalling, the ways these things work doesn't just recall, they mix, and use other methods. GPT is very advanced. I won't say it is complex though, it can be explained easy if you know your work.

GPT is only a step forward, calm down everyone, the next AIs will add what they lack.
Title: Re: GPT-3
Post by: Don Patrick on August 01, 2020, 05:18:17 pm
I am calm. You're the energetic one on this forum.

Here's an interesting test where someone tried to teach GPT3 to disginguish nonsense. I think I see what metric it's using, but that metric would produce too many false positives for rare questions to be of use to me.
https://arr.am/2020/07/25/gpt-3-uncertainty-prompts