So I know lots of people are working on different parts of AI, but let's look at these below, as they are just a few AIs but pack a lot of power. As you know I have tried JUKEBOX, and GPT-3, which feel very close to human level text completion and audio completion ex. techno, in that you can give them any context and they will react/ predict what to do next like we would. I have not tried though NUWA which is text to video and more. Below I try multi-modal AI, it too is close to human level. It is amazing these AI train on a ton of data like a human adult brain has "seen" in its lifetime, and they are using multiple senses! And they are using sparse attention, Byte Pair Encoding, relational embeds, and more! Deepmind is also looking into looking at a corpus to copy matches, like a painter looks at the model to make sure they paint the truth onto the canvas as a guide. We might get AGI by 2029!
Note: Download any image below if you want to see it's full resolution, most are close however any image that has ex. 16 image stuck together left-to-right, those are just badly shrunken, do download those ones if interested in them.
Note: small GLIDE, made by openAI, below, is only trained on 67-147 million text-image pairs or so, not 250M like the real GLIDE, and is 10x less parameters ("neurons")! 300 million.
Here is a no text prompt, all i fed in was half an image:
https://ibb.co/dQHhc0FUsing text prompts and choosing which completion I liked, I made this by stitching them together (it only could be fed a square image, but still came good!):
original image
https://ibb.co/88XLykdelongated (scroll down page)
https://ibb.co/Rz9L03X More:
"tiger and tree" +
https://ibb.co/txsWY9h=
https://ibb.co/SnqWYr4"tigers and forest" +
https://ibb.co/XLGbHdw=
https://ibb.co/9GZ2s6p"tigers in river and forest" + above
=
https://ibb.co/zGw6kQY"circuit board" +
https://ibb.co/P6vnpwK =
https://ibb.co/61ySX7H"wildlife giraffe" +
https://ibb.co/d4C3cH1 =
https://ibb.co/zXSTF3N"bathroom" +
https://ibb.co/KzGqtFz=
https://ibb.co/9H1YqWz"laboratory machine" +
https://ibb.co/cTyXzTG =
https://ibb.co/6NjsJDK"pikachu" + image
=
https://ibb.co/3zJgWPw"humanoid robot body android" +
https://ibb.co/XWbN42K =
https://ibb.co/pQWZ6Vd"bedroom" +
https://ibb.co/41y0Q4q =
https://ibb.co/2Y0wSPd"sci fi alien laboratory" +
https://ibb.co/7JnH6wB =
https://ibb.co/kBtDjQc"factory pipes lava" +
https://ibb.co/88ZqdX9 =
https://ibb.co/B2X1bn3"factory pipes lava" +
https://ibb.co/hcxmHN0 =
https://ibb.co/wwSxtVM"toy store aisle" +
https://ibb.co/h9PdRQQ =
https://ibb.co/DwGz4zx"fancy complex detailed royal wall gold gold gold gold"
https://ibb.co/BGGT9Zx"gold gates on clouds shining laboratory"
https://ibb.co/qjdcPcR"gold dragons"
https://ibb.co/L5qkmFSIt generates the rest of an image based on the upper half, or left side, or what's around the hole you made (in-painting), and based on the text prompt provided. You can use only text prompt, or only image prompt.
The use cases are many, you can generate the rest of artwork, or a diagram, or blueprint, or short animation (4 pages stuck together as 1 image), or to figure out what a criminal looks like, or loved one.
You can tell it to generate a penguin's head for the missing top and but with a cigar in its mouth, lol, and so on. You can ask for just some text request and it'll pop out such image.
Ya with these AIs you can get more of your favorite content easily and if it had a slider, you could easily elongate the image or song and backtrack on what you didn't like it made (choose which completion, ex. for the next 5 seconds)
GLIDE also works with no text prompt, it does fine, just maybe 2x worse
--no text prompts--
https://ibb.co/TqPC18xhttps://ibb.co/XynhTWShttps://ibb.co/ScFLrpkhttps://ibb.co/s9jhrvbhttps://ibb.co/Bcr3WXrhttps://ibb.co/chJBkTJhttps://ibb.co/PFJkKFwhttps://ibb.co/GF2HwXPTo use GLIDE, search Google for github glide openai. I use it in kaggle, as its faster than colab for sure. You must make an account then verify number then open this in colab and only then can you see on right side the settings panel and in there u need to turn on GPU and internet. Upload images to right side top Upload, and then in the image calling part of the code that says ex. grass.png you put there simply ex. see i have:
# Source image we are inpainting
source_image_256 = read_image('../input/123456/tiger2.png', size=256)
source_image_64 = read_image('../input/123456/tiger2.png', size=64)
To control the mask change the 40: thingy to ex. 30 or 44. To control the mask sideways, add another one ex. [:0, :0, :30, :30] or something like that if i got it wrong, you just add one to the end i mean haha. Apparently you can add more than 1 mask (grey box) by doing ex:
mask[.....]
mask[.....]
mask[.....]
.....
Batch size sets the number of images to generate.
Once its done, click console to get the iamge and right click it to save it.
Here's mine for the opensource minDALL_E (this one had no image prompt allowed. So, just text.)
minDALL-E was only trained on 14 million text-image pairs. OpenAI's was trained on 250M. And the model is only 1.5 billion parameters, ~10x smaller. This means we almost certainly HAVE IT! Anyway look how good these are, compared to OpenAI.com's post!!
"a white robot standing on a red carpet, in a white room. the robot is glowing. an orange robotic arm near the robot is injecting the robot's brain with red fuel rods. a robot arm is placing red rods into the robot brain."
https://ibb.co/0VL8Rvx"3 pikachu standng on red blocks lined up on the road under the sun, holding umbrellas, surrounded by electric towers"
https://ibb.co/xDQT3f6 "box cover art for the video game mario adventures 15. mario is jumping into a tall black pipe next to a system of pipes. the game case is red."
https://ibb.co/VBXVWsn an illustration of a baby capybara in a christmas sweater staring at its reflection in a mirror
https://ibb.co/5WZWLT4an armchair in the shape of an avocado. an armchair imitating an avocado.
https://ibb.co/nwwf1v4an illustration of an avocado in a suit walking a dog
https://ibb.co/bvfPkxfpikachu riding a wave under clouds inside of a large jar on a table
https://ibb.co/jHjV7mfa living room with 2 white armchairs and a painting of a mushroom. the painting of a mushroom is mounted above a modern fireplace.
https://ibb.co/VmKqbHka living room with 2 white armchairs and a painting of the collosseum. the painting is mounted above a modern fireplace.
https://ibb.co/K5fPkvjpikachu sitting on an armchair in the shape of an avocado. pikachu sitting on an armchair imitating an avocado.
https://ibb.co/XLJV4Hban illustration of pikachu in a suit staring at its reflection in a mirror
https://ibb.co/nMQRccf"a cute pikachu shaped armchair in a living room. a cute armchair imitating pikachu. a cute armchair in the shape of pikachu"
https://ibb.co/dbJ1Ks6To use it, go to this link below, make a kaggle account, verify phone number, then in this link below, click edit it, then go to setting panel at right and turn on GPU and internet. Then replace the code below, it's nearly same but makes it print more images. If you don't, it doesn't seem to work good.
https://www.kaggle.com/annas82362/mindall-eimages = images[rank]
n = num_candidates
fig = plt.figure(figsize=(6*int(math.sqrt(n)), 6*int(math.sqrt(n))))
for i in range(n):
ax = fig.add_subplot(int(math.sqrt(n)), int(math.sqrt(n)), i+1)
ax.imshow(images
)
ax.set_axis_off()
plt.tight_layout()
plt.show()
NUWA - just wow
https://github.com/microsoft/NUWA
OpenAI is working on solving math problems if you look at their website openAI.com. The ways to do this IMO (learning to carry over numbers) is either by trial and error, or reading to do it, or quickly coming to the idea using "good ideas". This carrying over method is what can allow it to solve any math problem like 5457457*35346=?. If openAI can do this, they can do other reasoning problems!!
Facebook's Blender chatbot version 2 is a good thing also, it looks like the way forward.
There's one other thing they may need to make AGI but I'm still researching it so I'll just keep it to myself in the meantime.
I know these AIs, while look amazing or seem like magic, feel like they are play doh i.e. they are not actually AGI or conscious. But we are close, yes, the human brain just predicts the rest of a dream using memories that are similar to a new problem, the new problem is new but it's similar! It is this ability that would allow an AI to be tasked a goal, and solve it.
The cerebellum is said to be solved once the rest of the brain (or was it the neocortex?) is solved. People with their Cerebellum removed shake when move and miss targets when grasping objects. IMO this is because our brains pop out predictions like say 4 frames a second, so what needs to be done that our Cerebellum does is it looks at the input "in background mode", and listens to the original predicted video/image and decides to move the motors such way again, but this time it will cause it to readjust. For example: One predicts seeing their hand move onto a computer mouse, but the Cerebellum would have time to see the hand is thrown over to the left and therefore will now predict the hand moving down+rightwards to move to the mouse as a 'video prediction' instead of just down-wards like originally predicted. Now, why don't our brains just run this fast? Maybe because it is too fast or too much energy. I mean why don't our brains think 10x faster, anyways? There's limits but the Cerebellum has found way maybe to use it or gain it.