I'm going to put in effort to explain my code, and what AGI is, in this post. If members of my group don't connect sooner or later they must leave for now in hopes to narrow down more similar clones of myself.
Our questions for AGI are solving cancer etc, these are big problems that need lots of data and pattern finding. You could feed GPT-2 800 trillion different questions, you can't just program the correct response to solve AI. GPT-2 is only 400 lines of code and can almost complete the rest of the sentence, image, or song correctly for any sentence/ input fed in. It is general purpose like the brain. Check out openAI.com.
The best algorithm on the Hutter Prize compresses 100MB to 14.8MB. This is an evaluation you must use every time you run your AI, it tells you if you correctly or better implemented the next part of your algorithm. The better it predicts the next letter the better it can losslessly compress data, it understands that data therefore.
My code above can compress 100MB to ~20.5MB. I know fully how it works. It's 100 lines of code too. I take a dataset of 100MB of text and start scanning it with a 16 letter long window, I store in a trie tree every 16 letters of it. I don't store the same root of the branch twice, 'we love food' and 'we love beds' can share the same root branch, brains don't store the same word twice, they instead strengthen connections to represent the Frequency of times seen. This strength fades and forgets eventually (permant version, keep reading). As my 16 letter window scans and builds a tree or hierarchy, I have the tree/ brain predict the next letter, too, to send to evaluation. If my input prompt is 'walking down the stree_?_' I search for exact match in the tree and get the things it seen came after that in the dataset. So after those 15 letters I have seen next letter t 44 times, a 5 times z 1 time, m 9 times, $ 1 time, ! 1 time, etc. This probability distribution is beggining to be learnt. Now, if I only have 2 possible next letters I saw can come next and have 77 observations, then I am sure I know the distribution.
Longer matches are better but rarer in a datset, if you have 33 different kinds of letters that can come next and each was seen about 1 time, you need much more observations still, so my code resorts to shorter matches, i search the tree for 15 letter matches, 14, 13... i get up to 16 sets of predictions, I basically stop if by the ex. 4th (4 letter) match I now have enough observations. So each, especially shorter matches get some weight, I mix all up to 16 sets of predictions.
For the no context set, a appear I saw 47455 times, b 5644 times, z 352, .... I divide each by the sum of all here to get a softmax score ex. a 0.24% counts are a, b 7%, z 3%. Same for contextual based prediction sets. The sets from long matches get less weight, so ex. 'a' 0.37% I give it 20%, hence 0.37 * 0.2 = 0.074, if it got 100% it's be 0.37 * 1 = 0.37. So shorter matches's sets of predictions get more attention.
So to sum it up, we know from many past experiences, given the context, and viewing the context in multiple ways, what next letter is Probably going to appear most the time and when would be z (once every 1,000 times see the context hell_?_ it'll be z).
The more data it trains on, the more accurate this network is, it's so fun! It improves expectingly if plotted per every 100 bytes or so you feed it. 10MB is better than 1MB.
For a set of predictions, letters that have manyyyy counts get even more, but then never reach perfect either, this is the exponential neuron threshold function. So a 43646, b 45, d 76, e 3, z 2.......a gets even more, it thinks a is 0.98% likely the next letter, but it won't go to 0.999999, the S curse shoots up fast it thinks it is yes or no, but then levels flat before reaching the top of the box.
I do this for layers too, so if the match of 8 letters has enough observations and need not include 7, 6, 5 letter matches to get more predictions, and 8 letters is realllly sure, i give more weight to its set of predictions then.
I also set manually for now a global layer weight, so ex. 5 letters gets 30% weight, i cut it in half or so to speak to allow it to decide on its own if 5 letter's set is sure enough, or not.
I use letter recency to improve next letter prediction. I may have seen z appear 55 times in my life in this dataset, a 46457 times, but for z seen just 100 letters back ago, it feels like i saw z 5000 times, it'll fade fast to 5 times, but makes me expect zzzzz_?_ to be a, z that comes next. This is yet another pattern, it merges energy on neurons to boost them for now, like we merge counts on a connection and branches in a network and threshold pooling yes or no. I do it for layers too, so i take the last ex. 4 letters, and search the last 300 letters for these 4 letters, collect all next letters, and boost these predictions, I include this in the layer set for it. It helped lots.
The evaluation makes a shorter 'long number' - if predicts better ex. 0.568654564745....this corrects predictions to the desired one for lossless extraction of the dataset back again, you run the same code to now de-compress it letter by letter. It predicts ex. p lots, but the next letter is o, so you store correction, and it's costly the more its predictions are wrong. This long number 0.7856856856 can be compressed further by making it into binary ex. 8 bits 01110100 can store 0-255, 3 bytes up to yup, because 8 bits can hold up to 256 combinations. Then this binary bits number is stored as bytes and you get Ty$3!74hd54sHJ8$0)3df in the final compressed file.
As I said in my work but not yet implemented, like other AIs use, translation is also used to recognize the prompt and get more predictions. Cat and dog share the same predictions, so if you normalize this data you can see dog and cat are very similar - of all predictions they share 80% of the things they each predict, so it allows you to recognize longgggg questions and know what letter comes next, cuz you get lotssss of matches, ex. 'I was eating ?' matches memories 'we were swallowing P', 'I was devouring P', 'we then ate P'.
Like Blender and PPLM etc but I never yet implemented, reward steers prediction, you can influence some % the prediction to be the letter or word love, food, sex, kissing, AI, and immortality. Through translation it can leak reward chemical to related nodes to learn new better predictions to achieve/ see/ predict the desired result. It's all based on cause > effect, statistics. Matches / merging is counting/ math. The brain efficienctly stores patterns and runs fast because of that.
BTW You recognize stretched / bigger/ rotated objects because if each part of its lines is strecthed or rotaed the same amount, there is no error, ex. h--e--l--l--o is 2 letters off for each, totalling to 8 error, but because each is off by 2, there is only base error 2, it's a pattern. If we had h------e-l--l-----------------------------------------o this will not "look" like hello, thesre is no a no doubt pattern it clearly is hello, it is random.
And multi-sensory, a brain tries to efficiently capture the world data, it can't brute force simulate everything and atom, it'd work if could to find all passwords - 1 is the answer, it could make all possible dead bodies if try all arrangement of particles in a simulation, but is costly. So brains use vision, sound etc to capture snapshots of the world, to get a spectrum, more eyes, and more diverse sensors capture the distribution faster. Same for more AGI agents.
There is many more patterns, but they ae rare, ex. "Tom Ron Clark has a mom naed Jane Bane ? (CLARK!)" and "rat wind mice, home scarf building, loop frog tunnel, pizza sun food, ant gum ? (BUG!)". This is actually a triple match translation and an energization of it, and it predicts a translation haha. A brain can learn new patterns just BY reading data/ patterns. Using my big few golden patterns above you can get IF-THEN rules and it can therefore build other rules from context>result prediction, see? It's a if then mchine. It models reality.
We also make our homeworld into a fractal pattern so we know where, when, what all is. We organize things into merged patterns, all is sqaure, circular, no odd errors, homes lined up and square, aka stacking counts occurrences see? Group similar buildings together like food stores, medical centers. And same for timing things. It allows us to predict moreeee accurately, and therefore survive longer. All we do is seek life extension (a structure that repeats in pattern or lifetime statue, a metal block or cloned pattern) by means of food sex home, AI cryonics etc, we clone ourselves and force ourrr schooling/ beliefs upon kids. AGIs will quickly clone their brain directly like cells, unlike atoms that emerge on their own. It's beneficial to clone your brain so you can help yourself do many things youuu wanted to do in paraellel. We use patterns ni brain, world, to BE a pattern. Nothing else exists but patterns, a rock/ toaster/ human are just evolved machines, we seek immortality and lie we are special SO to extend our lifetime, we fight to live longer, rocks dont simply.