Releasing full AGI/evolution research

ivan.moony · « **Reply #75 on:** May 14, 2020, 09:33:06 am »

Quote from: LOCKSUIT on May 14, 2020, 08:45:08 am

The new AGI vid did take 36 mins and is more summarized key points, that is possible. But still, 5 years worth of research and a partially complete AGI blueprint, in 36 minutes and a text to go with it, and you want it in 10 minutes !

There is a technique of gradual summarizing I like very much. First explain everything generally in 20 seconds (1). Then explain it again, but in 5 minutes (2). Then explain it thoroughly in whatever time you need, but by repeating (1) and (2) on each sub-section of the thorough explanation.

Five years is a lot of time that deserves a careful thought about presentation. I'm not saying this or that way is the best, but I think freezing one still image over 30 minutes does not instill confidence about something that might turn into a serious advance. My advice is at least put some thematic textually structured slides behind the speech, and please, make an effort of creating the algorithm pseudocode. That way you'll see more clearly strengths and weaknesses of your approach comparing to others.

I'm just trying to help, but it's your project after all, and you probably know the best way of how to behave responsible about it. I may only wish you well.

LOCKSUIT · « **Reply #76 on:** May 14, 2020, 09:51:18 am »

Quote

"There is a technique of gradual summarizing I like very much. First explain everything generally in 20 seconds (1). Then explain it again, but in 5 minutes (2). Then explain it thoroughly in whatever time you need,"

I literally was just thinking about that on my own, cool eh? True.

LOCKSUIT · « **Reply #77 on:** May 15, 2020, 09:54:31 am »

Reminder: my big last posts on previous page(s), don't miss!

Something I posted on OpenAI's group chat on Slack with the Clarity team (ROFL):

I'm still trying to find a good angle to share/ intake, as I'm sure I have insights and sure yous know things I don't too. Circuits's goal is Clarity in understanding existing vision networks, and ultimately AGI in the long run, because they want to improve the design and therefore need more answers to how the brain works. Let's try this question for now: Do the existing vision nets yous are focusing on use Backpropagation? I'm sure it results in a similar weighting that a no-backprop approach would create! So why not a more natural/efficient way of Learning? In my design for AGI I stick to wiring together features recently activated ex. 'h' and 'i' to get the new node 'hi' and I use node accesses to update the connection weights so that if 'hi' was seen 8 times - the connections would each have 8 strength. This very idea runs my simple trie-based Predictor. Further, when 2 features like dog and cat both have snow around them, this would cause dog to light up when cat lights up, hence wiring cat to dog and updating the connection as well. Nodes with few frequencies get pruned/ forgotten or blended with other nodes ex. 'well hi there' and 'hello my friend' becomes 'hi friend my', or 'hi' and 'hello' become 'hielo'. Sometimes (I think) weights are rightfully unevenly stronger and some letters trigger it more than others ex. 'hiElO'.

BTW did everyone know this Deep Learning trick to prune weights?
http://news.mit.edu/2020/foolproof-way-shrink-deep-learning-models-0430 (edited)

LOCKSUIT · « **Reply #78 on:** May 17, 2020, 10:26:37 am »

Intelligence just rised again;

Convolution, Pooling, and RELU can be done in my design by time delay of individual nodes firing - if "the cat ate food" is heard it may activate features "the cat" and "ate food" which may activate the node "the cat I saw ate food". Little parts get matched, sometimes not fully as shown because of time delay/ order is different, which ni turn these partially activated nodes activate yet higher nodes partially too! Basically it's ok if not exact match, order of parts of parts of parts is resultingly similar. For Pooling, we can reuse an already existing mechanism - when prediction candidates are activated in my hierarchy design, only 1 usually top prediction is heard/spoken, meaning even though others do have energy, they get a lot less weight. And RELU is exactly that.

&

1) All neural networks compress themselves to learn the latent salient features, they prune low frequency nodes/ connections, blend nodes, store features only once and increment frequencies (strengthen axon weight), trigger related nodes (translation), etc. Once a network is compressed/ learns a good model of the data fed to it, it can predict/ generate True data from the same distribution by using top k predictions softmaxed.
2) Lossless Data Compression is another thing, which is the best Evaluation method for testing Predictors, a neural Predictor is really good at guessing the next letter/ word and can store a separate file that stores error correction (steering the top k predictions to the correct one) and that compresses a file.

Object recognition is used in vision, and text neural networks. The goal is to work with strings of objects; sentences in time. AGI is all about updating where to collect or generate data from, it uses large context in a model to make decisions/ the future state of Earth:
https://www.reddit.com/r/agi/comments/gcln3p/thought_experiment_what_does_a_body_do_for_a_brain/

Elaborate? Maybe summarize haha. In my movies you see my network has a lot of things for Prediction; frequencies, translation, robustness to typos etc, activation functions, energy remaining, etc. Prediction is truth. It models the data distributions. Trust is based on context; truth. The 2nd major thing in my network is Reward, deciding what website or what mental thought to look into or which motor action tweaks to try is a recursively updating process of where to collect/generate data from and evolves its own goals to achieve the root goal Survival. This goal finding steers the prediction to a path that it wants to meet so that it know "How" to get what is "Wants". The brain modifies long term memory nodes that exist etc, short term working memory (temporary energy), and permanant energy (reward). It updates all 3, to reach the reward answers.

LOCKSUIT · « **Reply #79 on:** May 18, 2020, 06:08:25 pm »

May 18 2020
Locksuite posts on OpenAI's Slack channel:

The article mentions Nick looking for curves in images and couldn't just choose 1 of 4 classifications for a given image, and says "We were surprised when we saw that activations fell naturally into different levels of activation.". Really? I knew that. I actually know something deeper. Check out this image: https://ibb.co/x5J7s9k

If my hierarchy schema reads "cat", it activates the "cat" node including other nodes ex. "cattle" and "tac" and of course parts of itself ex. "at" and "t" to variable degrees. Each node has predictions of what comes next. They combine predictions by shared parent nodes. I have a real algorithm that does this. Energy flows rightwards only, you can't repeat the alphabet backwards naturally. Only how it was stored in order.

As shown in my image, the "cat" node activated will activate neighboring context nodes, and through these local channels and only through these channels will trigger/discover nodes like "dog" that share the same contexts - cats and dogs both eat, sleep, run, lick, etc. The cat, cattle, dog, etc nodes are activated by variables amounts and all mix predictions. It can recognize unseen sentences plus use many matches/judges for prediction.

My hierarchy schema can also discover/trigger my=your if it stores "my dog" and "your cat", because the shared contexts are, while not exact, similar, hence leaking energy still! Further, if both rabbits and horses are dogs, animals, 4 legged, cute, and have 2 eyes, then rabbits triggers horses. This is getting deep now, very parallel, each node leaks energy and each of these then does too...

LOCKSUIT · « **Reply #80 on:** May 19, 2020, 12:40:19 pm »

A natural and explainable brain? Another visit to my design.

New pictures are provided too.

I have been working on AGI for now 5 years full-time (I don't get paid), on mostly text sensory but it turned out to be very very insight-full, I have a very large design with many of the "bells and whistles", while the architecture itself that can run it all is very simple and has made some of my friends cringe. Below I lay down a good portion of my work. I really hope, you can help my direction, or that I help you.

Nearly every part of my design/ architecture can be found in the AI field. Hierarchies, yup. Weights, yup. Rewards, yup. Reward Update, yup. Activation Function, yup. Word2Vec, yup. Seq2Seq, yup. Energy, yup. Pruning, yup. Online Learning, yup. Pooling, yup. Mixing Predictions, yup. Etc. It's when I unify them together I start getting a new view that no one else shares. I'm able to look into my net and understand everything and how they all work together, that's why I was able to learn so much about the architecture.

I've coded my own Letter Predictor and compresses 100MB to 21.8MB, world record is 14.8MB. Mine updates frequencies Online and mixes predictions, and more. I still have tons to add to it, I will likely come close to the world record easily. How it works is in the supplementary attached file.

So I'm going to present below a lot of my design, showing how I unify a things together in a single net. And you can tell me if there's a more natural way or not. I've tested other people's algorithms like GPT-2 and they can accomplish what I present, but the natural way to do it is not shown in an image or ever explained like I explain it, they just stack black boxes on each other.

See this image to get a basic view of my architecture. It's a toy example. https://ibb.co/p22LNrN It's a hierarchy of features that get re-used/ shared to build larger memories. The brain only stores a word or phrase once and links all sentences to it ever heard. That makes for a extremely powerful breeding ground. Note the brain doesn't store a complete pyramid like I show in my image, just bits n parts; a collection of small hierarchies. So think of my image as a razor tooth saw, not a single very tall pyramid triangle. https://ibb.co/d4JVm55

Notice all nodes are too "perfectly" clear? Well nodes can be merged "whalkinge" and have variable weights "wALkinG" and be pruned "my are cute" to get a "compressed" fuzzy-like network but we can for now keep a clean hierarchy so we can easily see what is going on!

I have a working algorithm (trie/tree-based) that updates the connection weights in the tree when accesses a feature (in the same order time ex. a>b>c, cba is a different feature), so it knows how many times it has seen 'z' or 'hello' or 'hi there' in its life so far! Frequencies! This is my Online Training for weights. Adding more data always improves my predictor/model, guaranteed. I tested using not Perplexity but Lossless Compression to Evaluate my model's predictions. So now you can imagine my razor tooth hierarchies with counts (weights) placed on connections. Good so far. Starting to look like a real network and can function like one too! https://ibb.co/hC8gkFC

Now for the cool part I want to slap on here. I hope you know Word2Vec or Seq2Seq. It translates by discovering cat=dog based on shared contexts. The key question we need/ will focus on here now is how does the brain find out cat=dog using the same network hierarchy? Here's my answer below and I want to know if you knew this or if you have a more natural way.

https://ibb.co/F4BL1Ys Notice I highlighted the cats and dogs nodes? The brain may see "my cats eat food" 5 times and then, tomorrow, may see "my dogs eat food" 6 times. Only through their shared contexts will energy leak and trigger cats from dogs. There's no other realistic way this would occur other than this. The brain is finding out cats is similar to dogs on its own by shared strengthened paths leaking energy. So next time it sees "dogs" in an unseen sentence like "dogs play", it will activate both dogs and cats nodes by some amount.

We ignore common words like "the" or "I" because they will be around most words, it doesn't mean cats=boat. High frequency nodes are ignored.

Word2Vec or the similar can look at both sides around a word to translate, use long windows, skip-gram windows, closer words in time have more impact, and especially the more times seen (frequencies). My hierarchy can naturally do all that. Word2Vec also uses Negative Sampling, and my design can also use inhibition for both next word and translation Prediction.

Word2Vec uses vectors to store words in many dimensions and then compare which are closer in the space. Whereas my design just triggers related nodes by how many local contexts are shared. No vectors are stored in the brain... Nor do we need Backprop to update connections. We increment and prune low frequency nodes or merge them etc, we don't need Backprop to "find" this out, we just need to know how/ why we update weights!

There's a such thing as contextual word vectors. Say we see "a beaver was near the bank", here we disambiguate "bank". In my design, it triggers river or wood more than TD Trust or Financial building. Because although "near the bank and the building" and "near the bank with wood" both share bank, the beaver in my sentence input triggers the latter sentence more than the financial one.

Word2Vec can do the "king is to queen as man is to what?" by misusing dimensions from king that man doesn't have to find where queen is dimensionally without the king dimensions in man to land up at woman. Or USA is to Canada as China is to India, because instead of them lacking a context they both share it here but the location is slightly off in number. But the brain doesn't do this naturally, just try cakes are to toast as chicken is to what? Naturally the brain picks a word with all 3 properties.

To do the king woman thing we need to see the only difference is man isn't royal, so queen is related to woman most but not royal, hence woman. This involves a NOT operation, somehow.

Ok so, when my architecture is presented with "walking down the" it activates multiple nodes like "alking down the" and "lking....." and "king...." ..... and "down the" and "the" and also skip-gram nodes ex. "walking the", as well as related nodes ex. "running up that" and "walking along the". My code BTW does this but not related or skip-gram nodes yet! What occurs now is all activated nodes have shared parent predictions on the right-hand side to predict the next letter or word. So "down the" and "the" and "up this" all leak energy forward to "street". This Mixing (see the Hutter Prize or PPM) improves Prediction. You can only repeat the alphabet forward because it was stored that way. Our nodes have now mixed their predictions to decide a better set of predictions. https://ibb.co/Zz91jQQ

My design is therefore recognizing nodes despite typos or related words. It can also handle rearranged words like "down walking the" by time delay from children nodes. Our "matches" in the hierarchy are many, and we have many forward predictions now, we can take the top 10 predicted words now. We usually pick the top prediction, mutation makes it not perfect on purpose, it's important.

You may wonder, why does thinking in the brain only hear 1 of the top 10 predictions? All 10 nodes are activated, and so are recently heard nodes kept Active! If they were heard, you'd hear them in your mind, surely? If you imagine video in your brain, it'd be very odd to predict the next frame as a dog, cat, horse, and sheep, it would be all blended like a monster. The brain needs precision. So Pooling, as done in CNNs, is used in picking from top 10 predictions! Other nodes and predictions still are activated, just not as much.

Also, Pooling in my architecture can be done for every node outputs! Not just the final high layer. Pooling helps recognition by focusing. Pooling can be controlled indirectly to make the network Summarize or Elaborate or keep Stable. It simply says or doesn't say more or less important nodes, based on the probability of being said. Like you may ignore all the "the" or you may say a lot of filler content that isn't even rewarding like talking about food (see below).

When given a prompt ex. "What do you want to eat? What?" you may first parrot exactly the start, and some may be said in your own loved words I, fries, etc. Or you may just say the entail. You might just say what they said and stop energy forward flow. And you might just say fries in replace of "What?". Why!? Because their words, and your loved words fries, I, etc are pre-active.

One more thing I'll go through is Temporary Energy and Permanent Energy in my architecture. You can see Facebook's new chatbot Blender is like GPT-2 but it has a Dialog Persona that makes it always say certain words/ nodes. So if it likes food or communism, it will bring it up somehow in everything. Just look at what I'm writing, it's all AI related! Check out the later half of this guy's video: https://www.youtube.com/watch?v=wTIPGoHLw_8

In my design, positive and inhibitory reward is installed on just a few nodes at birth time, and it can transfer reward to related nodes to update it's goals. It may see contextually food=money, so now it starts talking about money. Artificial rewards are changeable, root goal is not modifiable as much.

For Temporarily Active nodes, you can remember a password is car and forget it, but of course you retain car node. This is a different forgetting than pruning weak weights forever. GPT-2 is probably using the last 1,000 words for prediction by this very mechanism. The brain already has to keep in memory the last 10 words, so any predicted nodes that are pre-active from being held in memory get a boost. If you read "the cat and cat saw cats cat then a cute" you predict cat, and the cat node is already activated 4 times just recently. You're holding the words in your hierarchy nodes, not on paper anymore. So yes energy is retained for a while and affects the Probabilities predicted!

I once played Pikmin for half the day, and when I went in the kitchen things looked like Pikmin faces or I seen them faintly but still somewhat vividly running around things. It causes dreams more random predictions from the top 10 or 100 predictions. It's not really good predictions in dreams.

You can see how this helps. Say you only read 100,000 bytes of data so far, and you now read "the tree leaves fell on the root of the tree and the", you have little data trained on so far, but you can predict well the next word is Probably a related word to tree, leaves, etc, so leaf, tree, branch, twig all get boosted by related words from recently read words. And it's really powerful, I've done tests in this area as well. The Hutter Prize has a slew of variants I presented. Like looking at the last 1,000 letters to boost the likeliest next letter. That's good but not as commonly accurate or flexible as word prediction using related words, instead of Exact letters! Big difference.

I look forward to your thoughts, I hope I provided some insight into my design and tests. I hope you can help me if there is something I'm missing, as my design does do a lot in a single architecture. I don't see why it's a good idea to study it as a stack of black boxes without fully understanding how it makes decisions that improve Evaluation (prediction). While my design may be inefficient it may be the natural way it all fits together using the same nodes.

To learn more, I have a video and a different but similar run through my design in this file (and how my code works exactly): https://workupload.com/file/Y4XhZPYHzqy

LOCKSUIT · « **Reply #81 on:** May 22, 2020, 10:14:05 am »

Earth formed about 4,500,000,000 years ago. The first duplicating cell on Earth emerged about 3,500,000,000 years ago. Humans, capable of improving their own tools and own design, emerged on Earth just 6,000,000 years ago. Since the last couple of thousands of years on Earth we have been radically improving our vocal/ phone/ computer communications, data storage/ computation, and transportation technologies to name a few big ones. We now have huge skyscrapers and every year or so a better iPhone or AI. The data exchange/ combination and mutation and specialization lead to the ability to more quickly do it again but faster. It's all going to go down now (for Earth) in the next ~100 years. We will invent an artificially duplicating hardware/computer that is efficient and programmable, even adaptive. Simple nanobots. It will allow us to improve the nanobots further and make bigger faster computers and collect more diverse data. All of Earth will become a grey goo nanobot swarm that can predict well (especially being a formatted terraformed planet being a fractal now, knowing where, when, and who all is using least data to know so is most efficient), regenerate super fast, create or become anything at whim, and will continue to grow in size by eating planets (although will have to be not so dense or else becomes a star/uranium and explodes radiation). We already can see Earth becoming a fractal, stores are built near stores, homes lined up...

The trend? Evolution is evolving longer living structures. Longer Lifespans, is the way Evolution works and prefers. Humans already seek longer lives.

I'm therefore not working on the wrong technology. AGI is the next species in Evolution. I have found hundreds of capabilities they will have that put them way ahead of us easily. Humans were smarter than apes by far, and AIs will be exponentially way more than us.

And my approach to AGI is not wrong. AGI needs not just more existing data/compute but a smarter discovery Extractor/Generator to create NEW desired data to its held questions (duplicating old data with mutations). The output of AGI is only to either implement plans or update where to collect data from, those silly RL walker robots do this and GPT-2 should if we improve it to do so. It is specializing in where to collect new data from, which question, which source. I don't really need a body for my AGI therefore. Output is just for implementation or data collection specialization updates.

For example, my algorithm I made from scratch, compresses the dataset enwik8 (100MB) to 21.8MB, which means it predicts pretty ok, and my net predicts better the more data it sees, for example if I used the dataset enwik2 (100 bytes lol) it'd compress it to only ex. 70 bytes. Get it?

SO, with the same dataset enwik8 of 100MB, how can I predict better if I don't have more data? Add more data. WHAT!? Yeah. Let me show you. When you find discoveries in the enwik8 dataset ex. cat=dog by shared contexts, you can recognize longer unseen sentences more robustly, and more! The world's best compressor can get enwik8 to 14.8MB. See?

To give a clearer example: If I window the last word of my context, to predict the next letter or word, ex. "the [cat] ?_?" > "the cat ate", I know, from up to 100MB, with experience, what follows it. BUT, i I look at it like "the [horse] ?_?" and "the [dog] ?_?" etc these words share the same en-tailing words usually so they will surely be helpful. And THAT, gives me more data/insight. Patterns are in the enwik8 dataset, some words are inter-exchangeable!

Oh, so here we see now: Basically, evolution moves faster near the end because of more data mutation. Hence, more storage, communication, and compute improve immortality of data lifespans. And not just more compute/data, but virtual/extracted insights, is where you get the most data. Hence, AGI and a faster/bigger computer both advance evolution! But AGI is much more potent at doing so.

LOCKSUIT · « **Reply #82 on:** May 22, 2020, 08:49:46 pm »

https://www.youtube.com/watch?v=TF5cJqXBwhc

ivan.moony · « **Reply #83 on:** May 22, 2020, 10:26:06 pm »

If you ask me, it's better than before, but try to pick a real example of compressing, like a short meaningful sentence, showing what happens to which variables/arrays on each loop step. Like a kind of showing what the algorithm does, step by step, accompanied by clearly shown input position/variable states/generated output. Just a suggestion.

LOCKSUIT · « **Reply #84 on:** May 23, 2020, 08:08:51 pm »

From another forum:

Quote

Intelligence requires that there multiple possible futures, otherwise we would simply be mechanically unfolding a pre-determined destiny.

First I'll start with the obvious side of the coin. We do know our universe is at least somewhat predictable, that's why we can repeat the same lab experiments around Earth, and build neural models of the world to learn patterns. The laws of physics make our world at least partially unfold along a deterministic path. A computer simulation or calculator is also replay-able - it's predictable. We are able to understand things because they are not random.

On the other hand, the word "random", by definition, means an outcome/result that varies maximally. So instead of a computer algorithm spitting out the number 5 predictably, you may get 1, then 8 next time, then 3, 0, 8, 6.... If it wasn't [maximally] random you'd get outputs like 6, 4, 5, 5, 6, 4, 4, 5. So random just means a wider variation. It doesn't mean it disobeys the laws of physics. It just means the view we look at something is ignoring fine details. For example, a woman can write down which color dresses she'll show you, so she knows the order of colors, but you don't, so to you it appears unpredictable, but to her, maybe the dress order is down pat and she remembers it like the alphabet. Another example, you fill a glass with milk until it pours over the rim, but the side that leaks first is different each time. Why? The glass is perfectly flat at the top let's say. It's because of the direction the human poured the milk in that caused it to pour over different sides. The human didn't stand in the same spot each time! The definition of random, that I gave here, basically equates to: you don't know something, so you output the wrong answer. But someone else can know what will happen! Lol. In other words, in the physics/laws we have, you can get an algorithm that outputs 5 each time, or outputs 4, 6, 5, 5, 6, 4, 4, or outputs 7, 1, 3, 0, 8, 4, 2, 8. And there's an actual reason behind it. Not magic. The definition of "random" I gave here, therefore, is when you lack information. You don't know what will occur. But once you learn what will occur, you know in the future what would result. This is assuming you can look in a computer or brain to see the stored algorithm. If you can't know what's inside, then you don't know if it will output 5 every time.

So, do we have another definition for the word "random"? Yes. I call it True Random. It would need to break the laws of physics. For example, an atom or particle would be shot into space, travelling, and after 45 minutes, decides to change its direction! There's no reason it should have, though. Nothing touched the system and nothing left the system. Now, we already know our world is at least 50% not True Random, but predictable. And in computers there's a thing called redundancy that stops errors from popping up. You can run a car simulation perfectly each time, the same way each time! You could run a human simulation, with no True Randomness! Unless it makes us act the way we do. So, True Randomness may exist, and it may be helpful in making more robust predictors that handle uncertainty. You could just make your world/borg garage larger. Larger systems can avoid errors and damage more than brittle delicate small systems. It takes longer for the errors to show up. So the borgs could more easily predict where things are at the high level. Now, one could argue that if particles acted truly random 50% of the time, it would show up in computer car simulations! But it doesn't. So the real reason we get errors is because there is faults at low levels we don't know about. That's all. Not True Randomness. Now, can we solve this? Yes. We already are. Humans produce babes without the DNA information disappearing. We can repair cars indefinitely. But we can't know where every particle in our system is, for to do so would require knowing where the particles (that make up our knowing) are, which is impossible. You could make everything into solid cubes, but you still can't model your world perfectly, only approximately.

The 4th side of the coin is magic orbs from God herself. Unfortunately, if you were hoping for this to be a valid thing, you are mistaken. Magic has no place, magic has to be either True Randomness, Randomness, or Laws of physics. There's no, such, thing, as magic. Either a particle moves as expected based on its and/or other surrounding context/conditions -OR- it pops into/outof existence some "move" or "particle" or "law" that truly is random. Say we had a genie ghost waving its hand with Free Will, granting wishes. The way it works is not by a existing predictive mechanism, but by popping into existence stuff, and must be non-random stuff. But why non-random? Because the genie would not exist, it'd be illogical soup. But what sort of "dimensional ether" is remembering or directing non-random creation in real time? We need something already existing to do this. A designer who creates a designer who... So it's impossible.

Quote

Your compression thingy will basically produce something that spews language, gibberish actually because there is no world model or understanding behind it. Much more importantly,there is no path to general problem solving, or even generalized language gibberish spewing, just a specific language.

"Your compression thingy"

This shows you lack understanding. Gosh. Lossless Compression is just an Evaluation for my neural net predictor I made. I could use Perplexity. Same algorithm, just different test of how good my algorithm is at predicting data in the distribution.

"will basically produce something that spews language"
"gibberish actually because there is no world model or understanding behind it."

Again, you're lacking here. Neural networks learn a model of DATA. Be it text or vision. - Both are language. Which means they CAN learn PATTERNS. Patterns mean frequency, because in a dataset you may see the letter 'z' or word 'grommet' appears not too often! Maybe nothing re-occurs! Maybe the whole dataset is tttttttttt. So you can predict/generate the likely future, being the letter 'e' or word 'the'. Now, because of these re-occurring letters or words, words like cat & dog can be found to share the same contexts. Dogs eat, dogs jump, cats eat, cats jump. Thank god the word "jump" appears at least twice lol. Else no semantics! SO: A neural model can learn the letter 'e' appears very frequently, 'z' appears infrequently, 'cat' is very contextually similar to 'dog', and 'cat' is very different than 'jog'. Neural Models help organisms to survive longer in Evolution. Even if you don't believe text data mirrors human vision_thoughts data, you can still trust the algorithm can work on ANY dataset by "finding" patterns. In FACT, the Transformer architecture used in GPT-2, works on vision and music datasets.

"or even generalized language gibberish spewing, just a specific language."

First of all, the algorithm I already coded from scratch can predict the next letter of any language/ generate other languages too, like Hindi, French, etc. You just feed it such dataset and it learns the patterns. Currently I use enwik8. Now, my future algorithm, and the already existing GPT-2 made by OpenAI, can already learn cat=dog semantically by shared contexts, cat/dog are interchangeable and it can recognize unseen sentences. It helps it knows what entails a given word or phrase by looking at many many similar situations from past experience. As well, it can learn hello=bonjour, if it is fed diverse data that has enough French words! This works for vision too. And if you use text + vision you will need to associate them in the same time they were shown.

"Much more importantly,there is no path to general problem solving"

You've literally just asked me how to create AGI. AGI needs to solve many different types, of Hard Problems. To do so, it needs a large/diverse model, not just so it can solve various domains, but so it can use all sorts of domains when solving a problem in a given domain. It needs to know frequencies or IOW Cause > Effect probabilities of our physics (dogs usually breath, not eat) to logically think about paths it COULD take. And must take a path it desires too, to reach the desired outcome. It must wait at steps, until they are completed. It must update goals through induction/semantics. Food = money = jobs = truck = wrenches. It will ask new questions and seek new data from specialized sources or questions. It may need to search/mutate answers before mentally generates a good well-backed/aligned answer. It needs to be told when you look at 2+2=, it must be a precise answer, not 8, even though it kind of answers the question. It needs to be told when you are unsure of the prediction for 2+2=, you must look at it a different way or collect more specific data, if it is unsure about 573+481= it can look at it a different way (assuming you are sure of 2+2=4 etc etc). You are told to resort to look at [5]73[+][4]81[=] so all you hear is 5+4=9, to carry over numbers and stack the 4 results together (must hold onto them therefore) to get 573+481=1054. A good challenge is taking requirements and translating it to Python code. Basically AGI needs to look at CERTAIN context, hold onto them or forget them (ignore/Attention), combine data or probabilities, like a Turing Tape. To do AND, OR, NOR, requires enough energy to activate it binaryally yes/no. These nodes can be made in the brain, like rules, by talking to the AGI. AGI is basically a net holding onto energies, triggering semantics or syntactics, developing "rules" for when to fire nodes or what features to look for or look at ex. word or letter level [567] or [5]67. You could look at everything I just wrote and figure out why I typed it all. Maybe I'm a GPT-2?

LOCKSUIT · « **Reply #85 on:** May 26, 2020, 02:14:59 pm »

Little update. I'm using all chars now in my predictive model, just a few letters in my input were all

? no matter the special char that *should* have been there. My code is approx. 114 lines of Python. Could make it a tad smaller. Also a python pro could make it yet more smaller. It's made of 4 parts; tree storage, tree searching, mixing/weighting predictions, arithmetic coding (evaluation of the algorithm).

Reference: world's best compressor gets 14.8MB for 100MB (enwik8 dataset)

1,000,000 bytes in
Shelwien's Green --- 256,602 compressed
My algorithm: --- 252,591 compressed

Green: 100MB > 21,819,822 bytes
I should at least reach then: 21,478,751 bytes

update:
NEW: 251,699

LOCKSUIT · « **Reply #86 on:** May 26, 2020, 09:37:21 pm »

Managed to remove 6 lines of code.

LOCKSUIT · « **Reply #87 on:** May 26, 2020, 10:18:08 pm »

How to use my code:
https://www.youtube.com/watch?v=3wTOSLOA9GM

My code:
https://workupload.com/file/Sbx7a5q77r3
The hardcode can be minimized, it has patterns.

LOCKSUIT · « **Reply #88 on:** May 28, 2020, 10:21:13 pm »

Confused? Someone was.

Now you feel the frontier of hard work not presented properly. Don't repeat this. Most do this.
You can adjust the number at start of video I'm on to compress how many letters, mine's set to 10,000.
I change the yes/no at top to compress/decompress.
I change the bottom thing to print the input2 to get either the encoded file compressed or the output decompressed. You plug in the encoding to top where shown.
My dataset is in folder shown. Enwik8. Well, part of the start of it.
I was showing at that part of the video how the out.txt was same as the input file.
BTW to decompress you need to modify as shown the input file so only 16 - 3 letters are in in....up to the x letter shown lol.
At the end a do a little calculation to get the actual bytes being compressed.

LOCKSUIT · « **Reply #89 on:** May 29, 2020, 12:37:38 pm »

mostly same but clearer rewrite:

Here is my [latest] python code I made from scratch.
And a new 1min video of me using it.

https://workupload.com/file/9x5Ft5EfBfn

https://www.youtube.com/watch?v=q0m-v9192o4

The hardcode in it can be minimized, it has patterns, I just left it for now. Most the prediction accuracy/ compression is not done by the 3 long middle columns, actually.

Mine can compress 100MB to approx. 21.4MB. World record is 14.8MB.

My code is explained in my book. Basically the more text my code sees the better it can predict which letter follows the last ex. 8 letters, and I mix up to 17 such ex. "The cat ran_" "he cat ran_" "e cat ran_"....

If my code found matches robustly with similar/rearranged position words, I could mix 8000 matched experiences instead of 17. It would help compression a ton. Maybe I'd get 17MB.

And if I held recently activated words, they would be pre-active and require less data to predict what word follows "cat cat cat cat", by just using the already activated nodes themselves! Similar words also become pre-active. Candidate nodes are already pre-active in the brain because the brain holds onto words it heard, so it's a natural thing, they are pre-selected/triggered for softmax output. I examined how GPT-2 does this and I once made a code that worked by doing this, it helped a lot.

And I also know yet more ways to get the compression down / prediction more accurate.

I link in my book also Shelwien's version in C++, I followed how his basically worked. His compression is 21.8MB.

Releasing full AGI/evolution research

ivan.moony

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

ivan.moony

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

Recent Topics

Recent News

Users Online

Articles