GPT-3

LOCKSUIT · « **on:** May 31, 2020, 10:58:37 pm »

https://arxiv.org/pdf/2005.14165.pdf

175 BILLION PARAMETERS

GPT-2 full model had only 1.5B

LOCKSUIT · « **Reply #1 on:** June 01, 2020, 12:40:22 am »

A "Burringo" is a car with very fast acceleration. An example of a sentence that uses the
word Burringo is:
In our garage we have a Burringo that my father drives to work every day.

So I tried GPT-2 for fun and got:

A "Burringo" is a car with very fast acceleration. An example of a sentence that uses the
word Burringo is:

"She drove her Burringo fast."

Nouns that include "burlingo" must be treated as adverbs, not as nouns.

3. In non-North American English, the word "burrito" is used for a combination of meat and cheese and is different from the singular form of the word "burrito".

"A big burrito is coming down the street,"

LOCKSUIT · « **Reply #2 on:** June 01, 2020, 12:48:09 am »

I did really liked this new one thingy they has showed ( Í¡Â° ÍœÊ– Í¡Â°):

Poor English input: I have tried to hit ball with bat, but my swing is has miss.
Good English output: I tried to hit the ball with the bat, but my swing missed.

Korrelan · « **Reply #3 on:** June 01, 2020, 06:27:51 pm »

https://www.zdnet.com/google-amp/article/openais-gigantic-gpt-3-hints-at-the-limits-of-language-models-for-ai/?__twitter_impression=true

yotamarker · « **Reply #4 on:** June 01, 2020, 07:05:53 pm »

how do I use it with java tho ?

LOCKSUIT · « **Reply #5 on:** June 01, 2020, 08:53:12 pm »

Ye Ye,

Quote

OpenAI's work on language has been part of the history of a steady progression of one kind of approach, with increasing success as the technology was made bigger and bigger and bigger.

The original GPT, and GPT-2, are both adaptations of what's known as a Transformer, an invention pioneered at Google in 2017.

I knew this for at least a few months. More data does increase intelligence, the dictionary literally defines information as "intelligence". Bigger systems can beat smaller versions of themselves; A device made of only 100 atoms that is the highest technology possible (made by aliens 99,999 years from now) still can't do much (unless it grows, at least).

However there's multiple ways to get "loads" of "free" [information] from, the same sized dataset. You need a better data insight extractor / pattern finder to get more virtual data. Data isn't random 1s & 0s. As well, then, now, throwing more non-virtual dataset size at it will also inflate the virtual data, so throwing x10 more data at it may give you 100x free data inside, and with a better extractor 10x the datset may feel like 10000000x more data inside instead of 100x. You won't necessary have to extract that much, certain information simply is so powerful. Predict what follows "cat cat cat cat _". You don't need a bigger dataset to find information here. Attention is drawn to active nodes.

ivan.moony · « **Reply #6 on:** June 01, 2020, 09:10:20 pm »

@L, Did you know that the same program may be programmed in 20 lines, and do the same thing as the one programmed in 1000 lines? Also that one of 20 lines may be many, many times faster than that one of 1000 lines. It is said that 20 lines one is an optimized version of 1000 lines one, and it does the same thing. Further, algorithms may be optimized either for speed, for size, or for both.

LOCKSUIT · « **Reply #7 on:** June 01, 2020, 09:19:57 pm »

Ivan, you're talking about a [learning] or [growing] algorithm/system. Even Brute Force is one.

The "actual" data or army force will Not fit in 20 lines of code lol. Trillions of images/ facts, or nanobots and Dyson spheres, are really really big things and don't fit in 20 lines of code. It takes time to attain that size.

The seed can be awfully small I guess, ya. But the time to grow takes at least some time! My point was when you have more data and a bigger army, you are much more capable.

While the most advanced DNA seed can be incredibly small and still grow into nanolord in a day, still, a too small seed will not be able to do much, but evolve much more slowly.

It's when the seed has grown will it show its pretty-ness

edit: yes i knew algoroirthms have a trade-off for time/memory/complexity. Ex. fast but big mem. Smaller code or RAM again makes it slower to learn/ grow.

Bigger cities get bigger faster. Big companies Pool/suck in cash etc lol.

Korrelan · « **Reply #8 on:** June 01, 2020, 11:00:21 pm »

What the authors are saying is that building a neural network that just predicts probabilities of the next word in any sentence or phrase may have its limits. Just making it ever-more-powerful and stuffing it with ever-more-text may not yield better results. That's a significant acknowledgement within a paper that is mostly celebrating the achievement of throwing more computing horsepower at a problem.

LOCKSUIT · « **Reply #9 on:** June 02, 2020, 01:45:29 am »

Korrelan, if adding more data didn't make a brain more intelligent, then I could literally stop researching how to build AGI tomorrow, because I wouldn't need any new information. Every day, my own brain's goal is to learn more, and in doing so I become smarter at: 1) finding more new data, 2) how to look at that data, and 3) deciding what to learn, and in doing so I come closer to being able to stumble on the paragraph explaining all details of how the brain works. I update my research domains everyday - specializing/ exploiting where I will explore next.

The problem with GPT-2 is many things and they all have the same trait: You can learn and learn and learn all the data you want, updating your weights to the point where you've seen eat entail dogs 88,100 times and run follow dogs 88,050 times, learning that dogs eat just a bit more than they run, but the search space is extremely huge and you'll almost never see the same sentence again as well because we don't use all the search space either, only quantized items in it. So instead of storing every phrase similar to "my cats eat food", I store only that one. At some point, adding more dataset doesn't improve the model, because basically you [can] [quickly] learn all the low level features like a, b, c, th, sh, .ed, ion, then at some point if you learn a larger feature - it is not shared as much and not as useful, smaller features are as powerful and are learnt more quickly, it's all you need, you don't need to store every possible 40 word phrase, only a model of physics! The issues with GPT-2/3 is that it doesn't know how to do that trick....fully. It does make a model, it does recognize "my cat ate" as "her dog ran" some amount (maybe bit less than could see, or too much), but it's not fully digging up the information hidden in the data, and so it's stuck with the ever growing Trie Tree mentioned, and therefore, being stuck closer to the root, sucks, and adding more data doesn't help the Trie Tree either lol.

What GPT-2 needs to do is see more information in the same sized dataset better/smarter. Do you understand korrelan what I mean? Does anyone here? Semantics let's you. You use cats when prompted with "dogs _?_" to help prediction, they share contexts/predictions. GPT-2, therefore, needs a LOT more data, so much that it's nearly impossible to give a number, and to do that requires tricks like Semantics, and many others no one talks about. Not storing a monster Trie Tree that has "seen all" 40 word sentences, many times over and over again.

@Ivan, and maybe, America can bounce back like that, like a seed, where people still know how to do what they do and still have the main machine-tools.

Don Patrick · « **Reply #10 on:** June 02, 2020, 08:46:07 am »

Quote from: Korrelan on June 01, 2020, 11:00:21 pm

What the authors are saying is that building a neural network that just predicts probabilities of the next word in any sentence or phrase may have its limits. Just making it ever-more-powerful and stuffing it with ever-more-text may not yield better results. That's a significant acknowledgement within a paper that is mostly celebrating the achievement of throwing more computing horsepower at a problem.

Indeed. The reason I appreciate this paper more than previous efforts is that this time they've actually taken a critical look at it instead of celebrating wishful thinking. They acknowledge things like testing data having been in the training data. Even though they tried to compensate for it, it adds much needed salt to take earlier results with.

Korrelan · « **Reply #11 on:** June 02, 2020, 09:34:33 am »

Yup, and even OpenAI are starting to realise/ admit the futility of an ungrounded pure language system.

Don Patrick · « **Reply #12 on:** June 02, 2020, 12:40:21 pm »

Quote

From the paper: "whereas ultimately, useful language systems (for example virtual assistants) might be better thought of as taking goal-directed actions rather than just making predictions."

This also sounds similar to discussions about forward and backward chaining in inference systems in the 80's. If you want to end up with a specific result, backward chaining is more fruitful, but if you want to explore all possible results, forward chaining does that trick. So it's like they have been trying to restrain a forward chaining algorithm to do the task of a backward-chaining algorithm. It's silly that this shortsightedness persists that there is only one single correct method.

LOCKSUIT · « **Reply #13 on:** June 02, 2020, 02:37:19 pm »

Quote from: Don Patrick on June 02, 2020, 08:46:07 am

Quote from: Korrelan on June 01, 2020, 11:00:21 pm
What the authors are saying is that building a neural network that just predicts probabilities of the next word in any sentence or phrase may have its limits. Just making it ever-more-powerful and stuffing it with ever-more-text may not yield better results. That's a significant acknowledgement within a paper that is mostly celebrating the achievement of throwing more computing horsepower at a problem.
Indeed. The reason I appreciate this paper more than previous efforts is that this time they've actually taken a critical look at it instead of celebrating wishful thinking. They acknowledge things like testing data having been in the training data. Even though they tried to compensate for it, it adds much needed salt to take earlier results with.

We've went through this already... I tested it thoroughly, giving it unique prompts and getting novel while amazing completions. It's not perfect but it's pretty stunning and creative and can combine lots of topics. The GPT-2 online, being the full model and/or perhaps his settings, seems to be not as good as it was when I did most my testing. Still, I got some good ones just now

infurl · « **Reply #14 on:** July 24, 2020, 10:52:42 pm »

https://thenextweb.com/neural/2020/07/23/openais-new-gpt-3-language-explained-in-under-3-minutes-syndication/

Here's a pretty good summary of what's happening with GPT-3

GPT-3

LOCKSUIT

GPT-3

LOCKSUIT

Re: GPT-3

LOCKSUIT

Re: GPT-3

Korrelan

Re: GPT-3

yotamarker

Re: GPT-3

LOCKSUIT

Re: GPT-3

ivan.moony

Re: GPT-3

LOCKSUIT

Re: GPT-3

Korrelan

Re: GPT-3

LOCKSUIT

Re: GPT-3

Don Patrick

Re: GPT-3

Korrelan

Re: GPT-3

Don Patrick

Re: GPT-3

LOCKSUIT

Re: GPT-3

infurl

Re: GPT-3

Recent Topics

Recent News

Users Online

Articles