Releasing full AGI/evolution research

Yervelcome · « **Reply #60 on:** May 02, 2020, 10:52:50 pm »

I'd like to give that robot DM the knowledge of what it feels like to walk through a forest killing goblins.

LOCKSUIT · « **Reply #61 on:** May 03, 2020, 04:47:56 am »

Iv'e been coding my own text predictors and researching them hard and I notice the key is frequencies of what usually comes next after your context [th] ex. you see [e] has high count observations. But you want the past window to be very long and grab your entail frequencies from those recognized features ex. [my dog ate the][k]. Which allows good letter prediction. GPT-2 predicts BPE words. You actually mix multiple windows because longer context matches are rarer but you can at least get some frequencies of what's seen to follow them. To help find longer matches you'd want to "translate" words cat=dog, accept different position appearance, and focus on rare words to summarize the context, so that when you look at the past 30 words you can find multiple matches in memory even though there is of course no experience exacting matching it - the alternative words, position, and filler content is in there but is similar or doesn't matter. So in the end, frequencies is running it, and even the recognition cat=dog is based on discovered shared contexts, based on frequencies. Probabilities run it and if a match is not exact then it's predictions will all get less weight.

Yes, what I show in the video appears to helps prediction by making the predictions more "similar" to the *rare story words (especially more recent words), it can look at ALL the past context. The main prediction in these algorithms however is from looking at the past ex. 3 or 20 words to get multiple "similar matches" to see what usually follows the matched contexts. You can look farther back if you 1) attend to only rare words and ignore ex. 'the', 2) can use similar words 'cat/dog', and 3) use similar position.

When you know "Paris is the capital of France" and see a new prompt "The capital of France is " you predict Paris with o-k accuracy because the context mostly matches this (and a few other things in your brain), and the 2 words that are switched around exist but with similar positions.

A good question is, do story words actually take their own vote on the prediction candidates? Or do we only use context matches to see what does come next? Well, if I keep adding the word 'cat' to the start of my prompt, it makes 'cat' more probable, inch by inch the probability rises, which would be unlikely that matches are finding this is what commonly follows. Below is a new video testing it out to see if the prediction is influenced from context matches solely or if it does actually use as well all story words to mindlessly vote on the next word (if the input is all cat, it's likely it will continue saying 'cat' or the similar).

https://www.youtube.com/watch?v=kF8U2FD9JXc&feature=youtu.be

I could try in Allen these inputs: 'bat' or 'bat bat' or 'bat bat bat', or, 'wind' or 'wind wind' or 'wind wind wind'....and no matter the word used, it will predict the same word, with more probability the more times it occurs. In the dataset it trained on is only briefly similar phrases, and I don't think they predict the same word that occurs in them. Yes my input matches them more and more because of similar words and hence the prediction will be similar, but, I don't feel out of 40GB there is enough "matches" to achieve that.

Keep in mind it predicts the *same* word, you'd think 'bat bat bat bat bat bat' would match things like 'my bat saw a bird on a bat but bats fly in bat caves' etc and would often predict only similar words like cave or bird....how many matches could you get that incrementally improve the prediction of 'bat'!? Impossible.

LOCKSUIT · « **Reply #62 on:** May 03, 2020, 07:03:16 am »

Thought Experiment - What does a body do for a brain?

In this thought experiment, I take a look into how the brain uses input and output from its body.

Let's imagine we have a brain in a body, and is stationed in a large laboratory that enhances its research and exploration. The reason it is not stationed in a normal home is because technological advances come from experiments and data processing, not sleeping eating and sitting on home furniture.

The lab and body won't intelligently move on its own, so let's direct focus to the brain now.

The brain cannot make any non-random output decisions yet, as it has had no input experiences yet. To do better output it needs input first. So the first thing we can do is either feed it lots of diverse image/text data, or let it try to walk forward and receive data about which motor actions got motion. So far so good, we just made it store its first experiences.

Up to now the brain has only received random input data (note the real world data isn't random but the source is) from all sorts of sources. It didn't decide where to collect data from, as it couldn't output any non-random decisions.

Now our brain can decide to tweak its walking skills to further improve its speed, or can decide to collect text/image data from certain sources such as particular websites or particular real life experiments. For example it may be trying to invent better storage devices and wants to see if it's predictions are correct or may want to collect data from there simply. Testing it's predictions is also data collection because it boosts it's already existing beliefs's probabilities.

The trend here seems to show that as it collects data, it is getting more precise where to collect it from. Without output from the brain, the brain could never collect data from specific areas. The brain is learning where to collect data from.

The 2 uses the brain has for output is 1) specific data collection, and 2) implementing solutions ex. a new product to market and seeing their mission completed (this is also data collection).

Our brain, if it had a solution to a big problem on Earth will all road-bumps covered, could just tell us it of course. It wouldn't absolutely require a body to implement a "plan".

The "coming up with a *new* plan" is done in the brain, it needs a lot of on topic data also. The output of the brain to the body is just to collect more desired data.

What is most interesting is when you have a lot of data, you can make a lot of connections in it and generate new data by using a method called Induction.

So what do yous think about the idea that we could make a powerful AGI without a body? I mean it still would have output to talk to us and input from various websites and experiments it asks us to do, but it wouldn't need a LOT of real life experiments if it has a lot of data because it can mostly generate its own data at that point and fill in gaps.

So most of its output in that case would be either a solution to a problem or a few requests of where to collect new data from if its solution isn't ready yet. Other than than it would be mostly doing induction internally using huge amounts of data. After all, experiments are only to collect data, we can give it lots even if not from precise tests.

My point here is AGI creates new data and is an induction engine, and works better with huge amounts of diverse data and on-topic data as well. That's all its input does - is provide data. The output is to collect certain data. But AGI *needs to generate *new data using all this data and/or find part of its solutions in data. For example finding a device in nature or an advanced civilization would be a solution that eliminates many sub goals. It could read about it in data too, if it trusts that data.

In that sense, AGI is about re-sorting existing features/data into new data/ devices or skills. To do that it needs a lot of data. AGI generates the future using a Lot of context.

What do yous think? Can we sort of get away without a body and just make AGI in the computer and talk to us using its text/vision thoughts? And can we get way with doing lots of specific experiments from the right locations and times and just use the slightly-more random big data? To me it appears to be yes we can. The AGI could still extract/form new answers from the large data. Like you know how Prediction works right? You can answer unseen questions or image hole fill in? So AGI can just 'know' answers to things using related data. And, what if it can just watch microscopic data and do all sorts of random experiments to see what "happens" and build a better model!? It is true though brute force is not computable in our case, but it's an idea.

HS · « **Reply #63 on:** May 03, 2020, 08:18:50 am »

To the degree and distance that things can be predicted, we should try to build a predictor AGI to do that. Its a visionary project with potentially great value to humans and sundry.
You are so devoted, and have invested so much time into this project that its probably difficult to talk to others about it in a productive way, because you are the only expert on this specific hypothetical technology. Its mostly still in your brain, as far as I can tell. I'll be curious to see how you will be able to manifest it for real, and how good such a large scale predicting machine will be able to get. Its definitely possible.

WriterOfMinds · « **Reply #64 on:** May 03, 2020, 04:41:24 pm »

@LOCKSUIT: The latest seems like one of your more coherent and easily understandable posts. More like this, please?

It's my opinion that AGI does not need to be embodied, though I know that there are many who would not share that opinion. However, I suspect that there's more to making an effective bodiless AGI than just "feed it a ton of data."

The prediction machine you're describing could easily be a Chinese Room. If you don't know what that is, I recommend reading up on it. For any input it is given, the Chinese Room produces reasonable output, but it doesn't really "understand" anything because it has no symbol grounding ... i.e. it can effectively manipulate data, but it has no idea what the data "means" to itself or anybody else. So it speaks coherently, but does not communicate; it has activity, but is not really an agent. A Chinese Room could be useful, but whether it qualifies as an intelligent entity is debatable.

Then again, given what your particular goals are, so long as it produces "discoveries" in its output you may not really care? However, providing your AI with some symbol grounding could still be a faster and more accurate way to achieve results than relying solely on pattern-finding in otherwise meaningless data.

LOCKSUIT · « **Reply #65 on:** May 04, 2020, 06:39:00 am »

Quote

@LOCKSUIT: The latest seems like one of your more coherent and easily understandable posts. More like this, please?

😃 Well... They kept deleting my posts on Reddit.... So I had to make it really really really simple to grasp so that 0 energy was needed to realize it... But sure.

But in text, there is meaningfulness/ grounding. Words share context, you'll find in big text data that cats and dogs both run, eat, sleep, have tails, etc. And you can describe what a word means by using Prediction or Translation ex. 'loop' is a cyclic iteration of some process repeated many times, or 'loop' = 'iteration' etc.

All sensory data can associate, all a body/output does for a brain is collect yet more data but from non-random sources. AGI is all about collecting/ creating new desired data from old states of Earth/ old data...self-modifying data...Earth evolves/ generates itself. Intelligence at its root is how long a machine can maintain its form for, hence it seeks to understand the universe and grow in size to create an army.

Art · « **Reply #66 on:** May 04, 2020, 04:23:33 pm »

"...So I had to make it really really really simple to grasp..."

To borrow a quote from Einstein: "Everything should be as simple as it can be but not simpler!"

ivan.moony · « **Reply #67 on:** May 12, 2020, 07:57:21 pm »

Hi Lock

Any new discoveries lately?

LOCKSUIT · « **Reply #68 on:** May 13, 2020, 07:44:03 am »

No more biggies yet, but will share more my current work soon...

I mostly have more to implement than need to discover...

The remaining puzzle pieces to "my AGI" won't exactly come from coding my current plan though, but rather more back to the drawing board.

It would be best if I had a real-time chat team that can "construct AGI now" and come to some sort of agreeance of an architecture, Person A would be like, no, because this makes more sense, person B would be like, oh, then you need this, etc, then we could make AGI already finally instead of our 1man missions.

If we don't hurry, we're all gonna die.... We have the chance to become nearly immortal young-again kings, let's do it...

ivan.moony · « **Reply #69 on:** May 13, 2020, 08:17:55 am »

Quote from: LOCKSUIT on May 13, 2020, 07:44:03 am

No more biggies yet, but will share more my current work soon...

May I propose to try to compose a bit shorter video this time (if that's your favorite form of expression). It would be nice if you structure your form of expression into chapters, sections, paragraphs... It helps understanding. Also, there are templates of expressing that researchers usually exhibit, like:

motivation of beginning to work
description of a problem you are trying to solve
possible solutions
solution comparation
potential benefits
conclusion, related work of other researchers, and future work

It doesn't have to be this exact template, but some form of structure would be desirable.

Quote from: LOCKSUIT on May 13, 2020, 07:44:03 am

I mostly have more to implement than need to discover...

The remaining puzzle pieces to "my AGI" won't exactly come from coding my current plan though, but rather more back to the drawing board.

Yeah, those are cycles I found in my work too. It exchanges between thinking out, implementing, then thinking more thorough, then reimplementing more again, ...

Quote from: LOCKSUIT on May 13, 2020, 07:44:03 am

It would be best if I had a real-time chat team that can "construct AGI now" and come to some sort of agreeance of an architecture, Person A would be like, no, because this makes more sense, person B would be like, oh, then you need this, etc, then we could make AGI already finally instead of our 1man missions.

It is very hard to gather a crew, and there are issues about being a leader, and being a follower, all the way down the command chain. I generally don't like being a part of that kind of structure. It's more like everyone has their own plan, and others have to abandon their plan to follow the leader. I don't want anyone to abandon their plan for me, and I don't want to abandon my plan either. So I choose to work alone.

But it might be a good idea to create a collaborative thinking platform software where people would have a structured chalk board and choose on their own whether they want to bring up a new issue-subissue, or help solving existing ones.

Quote from: LOCKSUIT on May 13, 2020, 07:44:03 am

If we don't hurry, we're all gonna die.... We have the chance to become nearly immortal young-again kings, let's do it...

I believe It is not that much bad. Dying could be a bit of an itch, but we won't be neither the first ones, neither the last ones to die. Anyway, somewhere in the future, someone might find a way to bring us back from the death, but it may be a case that it is better to be what we call dead, then to be alive. But it is worth of checking and being aware of the difference anyway. Personally, I would want to know what it's alike to be dead. My presumptions say that it is the only ethical form of existing, and that being alive is a gift we got from someone, but we would want to return it one sunny day to that someone. This is just a feeling, I might be wrong.

LOCKSUIT · « **Reply #70 on:** May 13, 2020, 08:53:00 am »

It is well know in Evolution that smarter machines emerge, but what makes them smart is that they can survive longer by modeling the world with inborn reflexes, schooling, and discovery. We evolved vocal cords and hands, to communicate ideas and build on them. Everyone's root goal is survival, shelter, food, and breeding, sure you can like race car boats and skydiving but a regenerative, tool-building, army is better at cooperating/ competing over limited resources, The ultimate technology will be very smart, it can instantly make you anything at whim, but it isn't just anything that goes on in physics, but things that maintain the form; duplication of information, mutations, food finding, radiation (waste). So, because humans can't live forever and are perplexed at how to achieve it, they criticize the ability to achieve it, even though they'd take it right up if they never had to age! It probably wouldn't even be a thought, no one would know ageing existed one day, they just carry on living. Machines can be programmed to erase memories and be happy always. Most humans say they want to live at the moment, some do say they don't but most won't. It's because of physics.

Art · « **Reply #71 on:** May 13, 2020, 02:30:24 pm »

...and to dust, you shall return...

LOCKSUIT · « **Reply #72 on:** May 13, 2020, 05:28:05 pm »

Had to re-watch my new video below to make sure it has everything most important in it as intended. Looks good. Do watch the full video, some things are further ahead/ is never a dull moment. First read the text below until you find the video, it's important...

I guess I'll give a little text speech below to go with it as well to solidify the presentation, just in case. Do note that below is also more many important paragraphs, this is just a presentation one for the video though:

As shown in the video, the brain is a collection of mini hierarchies, re-using shared feature nodes linked as contexts.

My working code uses a trie/tree and each letter and phrase seen up to now has frequencies stored based on how many times it (a node) was accessed. Larger phrases have fewer appearances. These are strengthened connections/ weights in the hierarchies!! They decide how much energy goes to a parent (how open the channel is). Multiple nodes in my code get activated and each has multiple parents that get activated too, and some are shared by other recognized nodes ex. 'the a' and 'he a' and 'e a' and ' a'.

You can see as well if It did recognize multiple similar nodes too (translation) they would as well share similar prediction candidates. All this works on its own. No backprop! > We increment node frequencies based on node accesses, and related nodes cat/dog are discovered/activated based on shared context leaking energy from cat to dog nodes on its own.

The prediction candidates that entail the ends also retain energy from prior activation, more if more recently, large proof is below BTW, and lesser if was a similar node that activated it. That's Temporary Energy, it can look at all the last 1,000 words!

Permanent Energy is always active memory, Rewards, look at Facebook's Blender chatbot that uses a Dialog Persona to make it talk about its agenda! It can have multiple goal nodes.

My design allows goal node updating by leaking reward to similar nodes ex. food=money or food=dinner, and now I will start predicting/ talking the next words about money now. Root goals are not as changeable, the artificial rewards are sub-goals.

The energy in the net defines itself over time, energy from multiple activated nodes leaks to a single candidate prediction (top k predictions, usually the most probable one, especially if very high probability than other candidates). My code stores Online the predicted Next Letter right now and that is what generates new discoveries that are somewhat true, depending on how confident the prediction probability is.

A good brain exploits and uses the likeliest nodes as prediction, not random data collection. But in dreams you can see we do not generate/ talk about our desired/probable nodes, but random ones, especially activated ones from the last day. It wants you to explore and generate more randomly by not using Permanent Rewarded goal nodes, and look around the last day's experiences to search for a while for discoveries.

The goal is to see how to predict/get to the desired outcome... It needs a lot of data to be sure it "made it" and isn't simply being told "aliens arrived, u can stop working no artificial organs now". It has to search sometimes for a while, and go through many sub goals, until it fits with the data... This part massively confuses me, how it actually knows how it reached the answer/ implemented in real life or has a solid discovery..... Or I mean how it knows which sub goals to make and which get met.... Pretty much we want it to make many desired discoveries, and listen to us if needs more data (either time to implement its idea or IOW feed it new data....to get new sub goal question rewards)

--------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------

(video) How can I improve my AGI architecture?

In the video below, I walk you through a fair amount of my AGI architecture I've been working on for 5 years. I'm looking for if I am missing something or if am on to something. The design is meant to be very very simple and explain a lot of how thinking occurs. Below is how my text predictor code works (100MB compresses to approx. 21.8MB), please read it twice before jumping into the video, you will learn some fundamental things all good text predictors are doing using frequency. Frequency is also used for discovering the word cat=dog. Note that that compression is for evaluation and is different than compressing a network to learn a better model. I should have also included in the video that Summarization, Translation, and Elaboration would be controlled by how much energy is allowed - you only say important features when you Summarize, not frequent or unrelated or unloved words.

How my text predictor/ compressor works (100MB>21.8MB):
My algorithm has a 17 letter long window step along the input file 1 letter (byte) at a time, updating a tree as it sees new data. The tree's branches are 17 nodes long because it adds a window to tree (after it finishes its search process described next), and updates node counts if passes any node. For each step the window takes, the algorithm searches the tree for 17 different searches each a letter longer. The children leafs (the final letter of a searched branch) are the predictions with counts seen so far in the file. Layer 1 nodes are children too and need no match. The tree is storing the frequency of all 1/2/3.../17 letters seen so far. The children are what allows you to predict/compress the next letter accurately. These 17 sets of predictions must be mixed because while the longest set is more accurate - we have less statistics, sometimes only 2 counts. We start with the longest found. Ex. 14 letter match in the tree. The 14th set of predictions may say it seen come next a=44, b=33, f=25, w=7. I sum a set's counts up to get a total of (in this case) 109, then I divide each count by the total to get %s that all add up to 1% ex. 0.404% 0.35%.... Now for all these predicted %s, we still have 13 sets to mix and must remove some % from them each. So what I do is I check the total counts of the set against a Wanted Roof ex. 109<>300 (maybe we don't even need to mix lower sets if we got enough stats), and so I cut each % of each prediction by about 1/3rd then in this case. And in this case we still desire 66% more stats. For the next set, if say we have 200<>300, I take away 2/3rds from the 66% - meaning we still desire 22%, not 66% - 2/3rds = 0%! I take away the % got OF the % still desired. A little bit of lower sets always leak in therefore, which is better because we can never be sure even if surpass Roof by lots. Besides, it gave better results. But Roof is decided by how many predicted symbols are in the set (total unique symbols being predicted), so if i have 2 then Roof may be 8 counts wanted. Also, while the Roof is based on how many different symbols are seen in the set, we get a slightly different Roof if we are on the ex. 5th set, i.e. if we have 4 letters in the set #14 then Roof is ex. 33, but if it is set #5 then Roof is ex. 26. Also, based on the Roof's size, a curve's bend is modified. This Activation Function curve/threshold gives small/large total counts in a set an even smaller/larger total (but it isn't used in the Arithmetic Coding, it's only used for deciding how much % this set gets in our mixer). This is meant to be a exponential activation. Finally a global weight is given to each set ex. the 14th set is always given 0.7% of the weight it was going to get lol. I hardcoded the numbers for now but the code isn't grossly large of course. If they were adaptive and were based on the data then the compression would be even better. I just noticed I do exit the mixing before reach lower sets if the Roof is ever surpassed, I'll have to test if this is useful. The Arithmetic Coder takes the combined sets i.e. the prediction %s are combined a, b, c + a, b, c + a, b, c ..... = a, b, c (softmaxed so all the predictions add up to 1% i.e. a, b, c = 1%), and the AC then takes a high and low bound 1-0 and takes the middle between the high and low, and starts misusing each % of the set, until matches the final letter in the window (same process whether compress or decompress). So say we stop once reach b in our set ex. a, *b*, c, we are in the float precision now of ex. 0.45-0.22. WE take middle again (0.23) and start misusing (once the window on the file takes another step. The encoding decimal keeps getting more precise, storing the whole file. To work in 16 byte float we need to carry away locked digits, meaning if the high and low are both now 0.457594-0.458988, we store '45' and get now 0.7594-0.8988, and we are going to be taking the middle of these 2 to make the decimal more precise then. This long decimal is then stored as a binary bin number ex. 6456453634636=10100011100111010011. I didn't implement the window to store the last few letter as branches i.e. the 17 letter window adds itself to tree but before predicting next it could add the 16, 15, 14, etc as shorter branches which would help just a 'bit' more. I didn't implement the removing same counts from lower sets that are just from the higher set, because it hurt compression, i.e. if there is 9 counts total in set 3 and 99 total in set 2, 9 of the counts in set 2 are the same observations and 'should' not help us reach Roof. I'll look into it more. Lastly, escape letters, my first set we mix is a dummy set that has super small weight and has every possible letter, in case we need to encode/decode one and hasn't yet seen it in the file, hence requires a small room in the AC high low bounds. I also hardcoded each probability in this dummy set, common letters get more weight. Compression/decompression takes 2 hours and 16 minutes for 10MB, but Python is slower. Ram is fairly big because I didn't implement the pruning. My algorithm handles incomplete/noisy information (uncertainty) unsupervised Online hence the mixing of window models. Better net or net compression and/or file compression and insight extraction (not decompression of FILE !), faster code and less RAM Working Memory used, all lead us closer to AGI, and smaller code does (a bit).

My code is in Python but for now I'm linking Shelwien's Green in C++, it's very similar. https://encode.su/threads/541-Simple-bytewise-context-mixing-demo

Video:

I think one key difference in ANNs for text is the network doesn't store nodes that can be displayed as solid letters and phrases as mine can, for example the lowest layer nodes a b and c may all point to the parent node 'abc', which has ex. a count of 5 times seen so far, but the 'a' that builds it has only 3 accesses seen so far. So instead of blended words or phrases, like 'cotg' made from cat/dog you might even get 'cOtG' where some children affect the node less. I'm unsure yet if that's useful.

From testing GPT-2 and making my own algorithm last year, I have strong evidence that nodes retain energy and the frequency predictions are helped out by already existing energy sitting in related nodes. As you know, when you hear something, it remains on your mind for quite some time. The last 80 words read are all energized, stored in order but as chunks, and are not on paper anymore but in your brain! They *need* to remain active in your brain. The more activated a similar phrase node - the more activated its prediction parents will be. But word nodes may also leak energy to other similar word nodes as well. The energy sitting around definitely will add to the prediction energies therefore, see? If 'leaf' is activated 40 words ago, and our prediction predict letters from word nodes, the leaf and grass etc nodes will also be pre-activated some bit. These energies eventually fade off your mind exponentially.

We can see Facebook's Blender uses also Permanent energies using a "Dialog" as they call it, making it *always talk/ask as if it has an agenda for being a communist. These nodes are hard reward coded from birth and *should update other related nodes to create new sub goals for the food node goal it will never change since is more reward hardcoded, you know you can't change the food node as its critical for survival.

My main points here is frequency in predictions runs my code, and recognizing similar phrases will increase counts (found using frequency, closest affect it most in delay time), using energy to boost related predictions helps a ton, and permanent reward does too. See how all that and more work in the hierarchies? What more can we do!? Can you add anything!?

I'm really excited if even just one of yous can advance the AGI design I'm at. I've seen a lot of ANN variants like variants of GANs, LSTMs, Autoencoders, etc etc, they seem to have things like residual connections, layer norm, convolution windows, many feedforward networks stacked, etc, while my design just sticks to a single collection of hierarchies. Of course you can get the same result by similar ways or by breaking it down into multiple tools with math tricks to get same result. But I'm looking for a more explainable architecture that unifies everything into the same general network, and can worry about the math tricks later. That's why I say in my work that to predict the next word, we ex. look at the last context (like GPT-2 does) and activate multiple similar phrase nodes in the hierarchy and see what entails them all, they are all little judges/hierarchies. I don't hear many people saying this, just RNN this, GAN that, no actual straightforward theory.

Transformers have been proven tangibly better than LSTMs in all areas (check out OpenAI and BERT etc), and the Attention Is All You Need papers says, well, it in the tittle. and was written by Google researchers. Transformers are parallel and can process much more faster than RNNs. And you don't need the recurrentness or LSTM schema, which is confusing.

I've read many articles on Transformers, they have a long process and many things used, and after reading them all there is no explanation how it actually works, anywhere, I'm the only one no Earth saying how GPT-2 works. There is some explanation if you look at Word2Vec or the Hutter Prize algorithms like PPM, but no one "knows" how GPT-2 works.

Energy remains in nodes and fades....see:
Improved edit:
Our brain is always dreaming, even when not dreaming or daydreaming. We actually recall stored features (especially energized or loved ones ex. when asleep from the last day, but we can do that in day too it just makes sure you explore, though you rarely exploit in dreams ex. work on AGI only) and recreate/create an experience, in the brain - we can't feel the real world.

Even more proof:

The proof that shows Temporarily Energized nodes do affect prediction is not just the fact that nodes recently heard must stay in memory active, but also if you had only a small dataset ex. 0KBs and were shown a prompt "the cat and dog cat saw a cat and the " - the next word is not going to be predicted well, much, but out of the 10 words in that prompt, 3 are "cat", so our probabilities can slap on 0.3 probability to predict "cat" next! This is much more powerful if discover cat=dog by shared context, we can see cat/dog appears 4 times in the prompt, often the past paragraph will talk about grass, leaves, trees etc if is about trees. Because all paragraphs will always contain "the" more than any other words, we ignore common words.

Also, Permanently Active nodes and Semi-Permanent Active nodes have reward on them which makes you talk about question goals you love/desire. So our "GPT-2" would talk about likely ways, to get what it wants. Mental RL. If nodes have Permanant activity which affects predictions, it's more likely that Temporarily Active nodes also affect prediction.

With my viz in the video (the image of the hierarchy), input goes up and activates nodes, and as well the predicted next word (parent nodes), but only the winner node (usually top candidate probability). The energy chaining the text in my design does not need to flow back down the net (generate output) to do this, because energy just leaks and keeps leaking, as you talk to yourself in your brain you hear the next word predicted and loops back into your net from bottom but maybe not as I just explained why...it would only loop back and activate the same node anyway. Another thing I said was you could duplicate the net and flip it so input goes up, output goes up out, not back down, but again, unneeded duplication and doesn't make it faster in this case.

Also, humans read text word by word level usually, not parallelly, hence far back nodes are losing energy, you must implement that in a parallel approach. Also as it talks to itself and humans it can only generate 1 word of the future and doesn't have it all yet. So for training on big data, you could do a parallel approach, but not for new data. Also the brain learns a a bi-directional context around a word feature, when it predicts the next word it only uses the left hand side past but its memory let's it see into the future before write next words, so in this sense learning a whole sentence fed in in parallel doesn't make the hierarchies any different, it increments frequencies (strength), adds nodes/connections, etc, same way as non-parallel approach, and the brain is predicting by looking ahead and is also using bi-directional network storage to recognize the feature it is looking at too.

So, learning data in parallel seems to work (storage-wise, all data/ relationships are/ can be captured), and prediction of new words/data is done in the net by leaking activity, bi-directional translation and future look ahead still work for prediction too.

ivan.moony · « **Reply #73 on:** May 14, 2020, 07:32:47 am »

Why only one static image through entire video? I'm trying to suggest some form of presentation turned into slides. Also,you might want to turn your algorithm into a pseudocode and show it in one of the slides. If it's too long, try to replace blocks of instructions with a single descriptive lines that you can develop on further slides. That would make an awesome improvement to your presentation. I think the goal is to describe what you want in the least number of words necessary. Simplicity and conciseness should be an advantage, and it is easier to achieve it through sound/vision than only through sound.

Also, planing of what is to be said/written could be outputted on paper as the first working version that you polish/shorten/optimize over days/weeks. Sometimes making a ten minutes video could take weeks if you are after perfection. As you approach perfection (you can come near, but you can't reach it), your final video should make a better overall impression, you would be taken more seriously, you would have more viewers, and mods should finally have a true insight in your work instead turning their head on the other side after the first minute of seeing the video.

Let me show you something:

Take a look here: https://github.com/e-teoria/Logos. I think I spent some three months just writing about it, carefully planning each chapter and sentence, and I'm not not counting the time to develop the theory itself. If you don't have a patience to read all of it, read only introduction.
Now take a look here: https://github.com/e-teoria/Esperas. I wrote it mostly in a burst of thoughts, not caring much about written words. Also, you can read only introduction to compare it to (1).

I hold that (1) gives a much better impression than (2), and I got much more positive reactions on Reddit, comparing those two. Some of us, less educated (including me) are not blessed with eloquency that people usually reach at university, so we have to spend more time to achieve a visible result.

Someone told me that people averagely spend overall 5 seconds on a web page once they visit it. So, when you are making a page, those five seconds are the most important to attract visitors to stay longer. Likewise, it could be a case that the first 10-20 seconds of video decide whether a viewer will click stop and go away, or continue viewing. So it might be worth to spend some extra time on those first 10-20 seconds.

LOCKSUIT · « **Reply #74 on:** May 14, 2020, 08:45:08 am »

Everyone is suggesting to be clearer. True... Still I hope yous tried reading/ watching....it took me a full 29 days of work to make sure a lot of things where in my AGI/universe book intuitively - that thing is honestly not something I'm going to re-write for a while.

The new AGI vid did take 36 mins and is more summarized key points, that is possible. But still, 5 years worth of research and a partially complete AGI blueprint, in 36 minutes and a text to go with it, and you want it in 10 minutes :) !

Can anyone say from my new video above what confuses them? Maybe I can see your reactions... Where (and what) do you say "ohp, I'm lost, why is this useful or how does it do x" ?

The below is a format for Papers but my work still presents the right things at he right times I feel....I summarize, elaborate, share related work, show code and results, and the whole thing is very intuitively explained. I really don't want to explain my work that way that I'm not comfortable with, my way is best :D When I bring up OpenAI's GPT-2, I bring it up.... All u have to do is collectively store it all, the order is there, it's not like I say the cat bed food ate slept in bed bowl then went to -_-

Title
Abstract.
Related work.
Your contribution (theory).
Experimental setup.
Results.
Conclusion.
References.
Optional appendix with supplemental details.

Releasing full AGI/evolution research

Yervelcome

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

HS

Re: Releasing full AGI/evolution research

WriterOfMinds

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

Art

Re: Releasing full AGI/evolution research

ivan.moony

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

ivan.moony

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

Art

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

ivan.moony

Re: Releasing full AGI/evolution research

LOCKSUIT

Re: Releasing full AGI/evolution research

Recent Topics

Recent News

Users Online

Articles