My HAl Rig

frankinstien · « **Reply #15 on:** November 03, 2021, 10:10:29 pm »

Quote

Looks like you want to implement something like hard disc streamed texture-mapping, for a bigger memory store.

I only use hard disks as a last resort storage medium for things that aren't used too often. Everything else is Gen3 or Gen4 Nvme Nand SSDs which have much much faster read and write speeds than hard disks. But Optane is one fast persistent medium 170 - 360 nanoseconds compared to Dram 90 nanoseconds. The speedup is orders of magnitude faster than Gen4 Nvme SSDs!

MagnusWootton · « **Reply #16 on:** November 03, 2021, 10:42:24 pm »

Indeed my knowledge on storage is pretty dated, must be completely different world these days.

LOCKSUIT · « **Reply #17 on:** November 04, 2021, 05:04:49 pm »

@frankinstien

wut...

What do you think is more efficient than a neural network or tree tries? How so? A database of plain text?....That is impossible to be good... There isn't a better way to compress data you're using (the program AKA neural network + code). How would you store then 100MBs of text as memories? My own algorithm needs to store (once I or if I finish coding it:) word offsetted snapshots like "[we walked fast down] the", "we [walked fast down the]", that is every ~5 words in the 100MB is needed, and uses a trie/tree to do that. It will also store parts by pointing at near-root nodes ex "we ate">"so much" instead of storing those 2 words again. You can also sort of compress the words in the tree too, so if your vocab has 3 lengthy words abcdfgghefg thegfffhhfghfhfj ewettwrfdhdh, then you store just 1, 2, 3, each 1 byte, then you can store sentences like 12111312 in the tree. You could compress the code or tree more kind of, though may be not worth it and cost speed then.

I'm not sure why you think storing a database of every ~5 words in 100MBs of text would be more efficient RAM-wise, that would cost the most RAM.

Like, if we have 100MBs of text, and store every ~5 words in it into a tree, it would actually cost now at least over a GB of RAM, but it improves speed by far, for a brain. Furthermore, if we fed GPT 40GBs of text, it's be at least 40GB in RAM then. It isn't though, it's 12GB RAM on GPU,

frankinstien · « **Reply #18 on:** November 05, 2021, 12:51:16 am »

Quote from: LOCKSUIT on November 04, 2021, 05:04:49 pm

@frankinstien

wut...

What do you think is more efficient than a neural network or tree tries? How so? A database of plain text?....That is impossible to be good... There isn't a better way to compress data you're using (the program AKA neural network + code). How would you store then 100MBs of text as memories? My own algorithm needs to store (once I or if I finish coding it:) word offsetted snapshots like "[we walked fast down] the", "we [walked fast down the]", that is every ~5 words in the 100MB is needed, and uses a trie/tree to do that. It will also store parts by pointing at near-root nodes ex "we ate">"so much" instead of storing those 2 words again. You can also sort of compress the words in the tree too, so if your vocab has 3 lengthy words abcdfgghefg thegfffhhfghfhfj ewettwrfdhdh, then you store just 1, 2, 3, each 1 byte, then you can store sentences like 12111312 in the tree. You could compress the code or tree more kind of, though may be not worth it and cost speed then.

I'm not sure why you think storing a database of every ~5 words in 100MBs of text would be more efficient RAM-wise, that would cost the most RAM.

Like, if we have 100MBs of text, and store every ~5 words in it into a tree, it would actually cost now at least over a GB of RAM, but it improves speed by far, for a brain. Furthermore, if we fed GPT 40GBs of text, it's be at least 40GB in RAM then. It isn't though, it's 12GB RAM on GPU,

Compressed code? I remember in 2010 Space Oddessy Two, they were talking about HAL's memory as something called a non-linear worm thing or something of a jigama. Which was eluding to AI would need a non-linear memory to be functional. Clarke, I'm guessing, wasn't aware of Hashcodes and how they can speed up lookups, but Clarke would differently roll over in his grave if he knew that current AI has to iterate through the entire matrix of billions to get an output! I don't compress data or use binary trees, I use hashcodes to represent everything! So by breaking all pieces of data into hashcodes I can search for relations within a few computations and parallelize those as stimuli are inputted into the system. I can find partial features of words not as text but as definitions of classifications, functional processes, and/or episodic memory that relate to other data. Those concepts are learned instantly, it's not a re-training of 700GB of data. I use a descriptive model(which I've posted on this board several times) which can incorporate any kind of data and encode that processes that use that data as boolean or spike code logic (read my post on Neurons are wrappers for digital processes.) that direct things like workflows or can be used in recursive analysis. Also, the timing chunker (read my post on the timing chunker that is modeled after human brain time chunking) uses hashcodes as well. Also remember that I use referencing, meaning that once I find one piece of data if it has relationships with other data, that data is referenced to the descriptor object(Review my object-oriented data model) so there is no searching for it, the data is available instantly! So with this approach, one piece of data can have relationships with millions of other data points, similar to neurons, and that data is readily available with a simple hash lookup.

So which is faster a room full of GPUs churning out iterations over billions of neurodes or a calculation of a hashcode done within a few hundred nanoseconds as I can incorporate SIMD and GPU to do hundreds of millions of these kinds of calculations. But the point is breaking stuff into nuggets of features that are computed into a hashcode will beat any ANN, even an army of ANNs in a warehouse of GPUs when it comes to finding data.

MagnusWootton · « **Reply #19 on:** November 05, 2021, 04:35:15 pm »

Quote from: frankinstien on November 05, 2021, 12:51:16 am

Clarke would differently roll over in his grave if he knew that current AI has to iterate through the entire matrix of billions to get an output!

Theres one matrix entry per synapse of the ANN, so you have to run the synapses linearly. That's to be expected, its not too much computation, In my mind, its perfectly feesable.

The bit you cant compute is working out what the synapse weights are. and thats 2^synapses! and thats why AGI isn't here yet.

Quote from: frankinstien on November 05, 2021, 12:51:16 am

I don't compress data or use binary trees, I use hashcodes to represent everything! So by breaking all pieces of data into hashcodes I can search for relations within a few computations and parallelize those as stimuli are inputted into the system. I can find partial features of words not as text but as definitions of classifications, functional processes, and/or episodic memory that relate to other data. Those concepts are learned instantly, it's not a re-training of 700GB of data. I use a descriptive model(which I've posted on this board several times) which can incorporate any kind of data and encode that processes that use that data as boolean or spike code logic (read my post on Neurons are wrappers for digital processes.) that direct things like workflows or can be used in recursive analysis. Also, the timing chunker (read my post on the timing chunker that is modeled after human brain time chunking) uses hashcodes as well. Also remember that I use referencing, meaning that once I find one piece of data if it has relationships with other data, that data is referenced to the descriptor object(Review my object-oriented data model) so there is no searching for it, the data is available instantly! So with this approach, one piece of data can have relationships with millions of other data points, similar to neurons, and that data is readily available with a simple hash lookup.

So which is faster a room full of GPUs churning out iterations over billions of neurodes or a calculation of a hashcode done within a few hundred nanoseconds as I can incorporate SIMD and GPU to do hundreds of millions of these kinds of calculations. But the point is breaking stuff into nuggets of features that are computed into a hashcode will beat any ANN, even an army of ANNs in a warehouse of GPUs when it comes to finding data.

That sounds really cool! you might be onto something amazing!

frankinstien · « **Reply #20 on:** November 05, 2021, 10:59:24 pm »

Just wanted to add that there are hashcode strategies where you can save quite a bit of CPU cycles where one side of the comparison can be done with as little as two instructions, so actually, the time can be in the tens of nanoseconds to do a look-up. Of course, mileage will vary according to coding approaches, OS environments, and CPUs or GPUs.

LOCKSUIT · « **Reply #21 on:** November 06, 2021, 04:47:00 am »

So you mean a sparse network then? Instead of computing the whole matrix. Ya, my AI actually would fit that bill then really good, if I can finish it. My network can come out to still small and fast as explained in my last post. Now, you say hashes, hmm, like the word 'walking' is converted into its ord and that chooses the location in a list in python code, hmm, yes that's very fast....but I bet the RAM will suffer let's see: You have 100MBs of text, and must store every ~4 words of it, and find them fast. To make the 100MBs hashable, you need to store in a small list (small as in 'using the hash table method') the ~4 words+prediction entailment, which means you need to store for 100MBs: 500MBs....40GBs?: 200GBs...So this extra large 40GB dataset of text that GPT has in RAM as 12GBs, would come to 200GBs needing to be in RAM. Or does it need be in RAM if is hashable (fast find) ? Anybody know? So for 10GBs of text it would work o-k, 50GBs of RAM needed then.

But don't forget you need to find ALL matches of "[we [walked [down [the]]]] ?___?", and combine the predictions to get a set of predicted words for all 4 matches, yup so 'the' has nearly 800,000 matches in 100MBs of wiki LOL, when they could all be put into a tree with max 50K vocab. You also need a semantic web like word2vec and need to store those embeds or connections.

So it's big....and slow...

frankinstien · « **Reply #22 on:** November 06, 2021, 08:40:37 pm »

Quote from: LOCKSUIT on November 06, 2021, 04:47:00 am

So you mean a sparse network then? Instead of computing the whole matrix. Ya, my AI actually would fit that bill then really good, if I can finish it. My network can come out to still small and fast as explained in my last post. Now, you say hashes, hmm, like the word 'walking' is converted into its ord and that chooses the location in a list in python code, hmm, yes that's very fast....but I bet the RAM will suffer let's see: You have 100MBs of text, and must store every ~4 words of it, and find them fast. To make the 100MBs hashable, you need to store in a small list (small as in 'using the hash table method') the ~4 words+prediction entailment, which means you need to store for 100MBs: 500MBs....40GBs?: 200GBs...So this extra large 40GB dataset of text that GPT has in RAM as 12GBs, would come to 200GBs needing to be in RAM. Or does it need be in RAM if is hashable (fast find) ? Anybody know? So for 10GBs of text it would work o-k, 50GBs of RAM needed then.

But don't forget you need to find ALL matches of "[we [walked [down [the]]]] ?___?", and combine the predictions to get a set of predicted words for all 4 matches, yup so 'the' has nearly 800,000 matches in 100MBs of wiki LOL, when they could all be put into a tree with max 50K vocab. You also need a semantic web like word2vec and need to store those embeds or connections.

So it's big....and slow...

I don't know where you got your numbers from but certainly not from an understanding of hashcodes. So, take a word that has four characters with simple ASCII that's 4 Bytes, but a 32-bit hashcode is only 4 bytes and the word like "pneumonoultramicroscopicsilicovolcanoconiosis" is 45 characters(45 Bytes) but the word is represented by other components which means there are even more bytes involved where each word is stored with its OL, all of that reduces to a 32-bit hashcode! The ontological component and the descriptor component provide feature or property states for each word and there is only one instance of those structures for each word. I have something like 790,000 words stored and the OL database and it's only 391MB, but its hashcode store is only 3.2MB with 32-bit and 6.4MB with 64-bit codes! My older server has 128GB and the new system has 256GB. The 391MB with the addition 3.2MB is but a drop in the bucket of all the ram I have! The descriptor component is just starting out but right now is averaging 20,000 bytes per word, at 790,000 words that's 16GB to cache it, but its hashcode per word reduces to just 4 to 8 bytes!

Ok, so you might argue; but you have to index those features as well, and you're right, the current descriptor DB feature index averages 881 bytes per word, so 790,000 words would be just 700MB, where each feature is a single instance with a HashSet that stores a reference to the descriptor component, again a drop in the bucket of all the ram I have! So, even as the data grows as the system learns and makes those descriptor components more complex there is plenty of room and the hashcode burden is trivial.

If you remember my post of the time chunking scheme where I ran out of 128GB in 32 minutes, but I later simply stored threshold deltas of stimuli which kept everything manageable for days, where, yes eventually you'll need to manage the temporal resources by writing to disk, in this case, is NVMe gen3 or 4 SSD. The approach makes things pretty responsive when having to find data on the disk, which is indexed with hashcodes and inter-file locations.

Now here's your problem with ANNs, you have to iterate through the entire network that doesn't really work in a way that can represent meaning as a point in memory. Your ANN distributes the description of words across the entire network which is why you have to iterate through the entire matrix to get an output. My approach doesn't, the data is focused into structures that have single instances that can even dynamically change in real-time, meaning the system can learn while it executes! As stimuli are entered into the system it is converted into hashcode sets that look up the relevant data that is associated with functions or processes to respond. So I don't have to iterate through the entire dataset as the Anns do, and can change the associations to those structures instantly, no retraining of the entire system. Also, remember accessing other data that relates to the descriptors is referenced whose instances reference other data. So, algorithmically I can gain access to data to provide more capabilities without having to randomly search for it since it's right there for the taking because of how relationships are linked/referenced, again speeding up processing and not having to iterate through billions of other neurodes that aren't really representative of what is need but you have to calculate their contribution to the output regardless.

With an Ann you can't just find the functional data points with a query, but with this approach, you can and that's why only a fraction of the computational horsepower is needed compared to an ANN. Here's another advantage, I can still use ANNs but they are much much smaller because they are focused on the semantic interpretations of a query that's initiated from stimuli, whose generalized states can be evaluated into patterns. Realize the ANN is called only after the data is matched to the stimuli. So the problem domain is much smaller than what GPT3 does which tries to encode everything into a big ANN. Also, this approach isn't trapped into an ANN solution only, so it opens up the framework to a universe of solutions, e.g. genetic algorithms, differential equations, Bayesian inference, etc.

MagnusWootton · « **Reply #23 on:** November 06, 2021, 10:10:34 pm »

Nice thinking, and Its cool watching you strive off onto the cutting edge.

But you cant defy the law of context. As in if you lose context, you cannot get it back. Maybe u are allowed to lose it? But if u do, it can never return and its permanent!

Read this paper::
https://matt.might.net/articles/why-infinite-or-guaranteed-file-compression-is-impossible/

Dont give up tho in you find it doesnt work, because your thinking is good, its the same whether your coming up with something works or not, and AGI does take a huge optimization like this!!! So dont give up and keep looking for it, and it is special, but it has to not defy context, but its kinda the same thing, works like magic for sure!

frankinstien · « **Reply #24 on:** November 06, 2021, 10:36:43 pm »

Quote from: MagnusWootton on November 06, 2021, 10:10:34 pm

Nice thinking, and Its cool watching you strive off onto the cutting edge.

But you cant defy the law of context. As in if you lose context, you cannot get it back. Maybe u are allowed to lose it? But if u do, it can never return and its permanent!

Read this paper::
https://matt.might.net/articles/why-infinite-or-guaranteed-file-compression-is-impossible/

Dont give up tho in you find it doesnt work, because your thinking is good, its the same whether your coming up with something works or not, and AGI does take a huge optimization like this!!! So dont give up and keep looking for it, and it is special, but it has to not defy context, but its kinda the same thing, works like magic for sure!

From a conceptual perspective maybe not. Remember, that when encoding things the more understanding of a concept on the decoder's side the more compression you can have. So let's say I use the set of symbols e=mc^2 that's only six characters but because I understand some physics and math the concepts of energy, mass, the speed of light and the mathematical function of squaring don't have to be included in the data encoding! So, too with this approach, because the context is a concept it resolves to a set of descriptors already stored, so any page, paragraph, or sentence or sensory stimuli computes to a set of concepts already stored. New concepts or variances of concepts can be helped by adding attributes from existing descriptors or borrowing from existing descriptors through nesting. Such contextual artifacts can then be shared across many other data points! So, we prevent duplication of data by insuring single instances of concepts that can be referenced at any time.

MagnusWootton · « **Reply #25 on:** November 06, 2021, 10:47:09 pm »

Quote from: frankinstien on November 06, 2021, 10:36:43 pm

From a conceptual perspective maybe not. Remember, that when encoding things the more understanding of a concept on the decoder's side the more compression you can have. So let's say I use the set of symbols e=mc^2 that's only six characters but because I understand some physics and math the concepts of energy, mass, the speed of light and the mathematical function of squaring don't have to be included in the data encoding! So, too with this approach, because the context is a concept it resolves to a set of descriptors already stored, so any page, paragraph, or sentence or sensory stimuli computes to a set of concepts already stored. New concepts or variances of concepts can be dealt with by adding attributes from existing descriptors or borrowing from existing descriptors through nesting. Such contextual artifacts can then be shared across many other data points! So, we prevent duplication of data by insuring single instances of concepts that can be referenced at any time.

If you arent losing context, I have no problem with what ur doing, it could definitely work. On the internet many of these so called "quantum people" are talking about "breaking math" all the time, and I disagree with it, maths is invincible it cannot be broken. the square root of -1, some would say, its "breaking math" But to me it isnt, its how it is supposed to be, If anyone ever defies the law of context I doubt it to the fullest extent. If mathematics didnt make sense for us, our lives are even more foolish than they are already.

So I doubt its "breakable", mathematics and the logic that makes it is unbreakable.

frankinstien · « **Reply #26 on:** November 07, 2021, 08:27:33 pm »

I have found favor from the AI GPU gods,

I actually picked up a NVIDIA GeForce RTX 3080 10GB for $750!

The RTX 3080 has a whopping 238 TFLOPS of Int8.

There's about a 13% difference between the RTX 3080 and RTX 3090. Now having both a Vega 56 and an RTX 3080 on the same machine should be interesting.

LOCKSUIT · « **Reply #27 on:** November 08, 2021, 03:04:51 am »

Quote from: frankinstien on November 06, 2021, 08:40:37 pm

Quote from: LOCKSUIT on November 06, 2021, 04:47:00 am
So you mean a sparse network then? Instead of computing the whole matrix. Ya, my AI actually would fit that bill then really good, if I can finish it. My network can come out to still small and fast as explained in my last post. Now, you say hashes, hmm, like the word 'walking' is converted into its ord and that chooses the location in a list in python code, hmm, yes that's very fast....but I bet the RAM will suffer let's see: You have 100MBs of text, and must store every ~4 words of it, and find them fast. To make the 100MBs hashable, you need to store in a small list (small as in 'using the hash table method') the ~4 words+prediction entailment, which means you need to store for 100MBs: 500MBs....40GBs?: 200GBs...So this extra large 40GB dataset of text that GPT has in RAM as 12GBs, would come to 200GBs needing to be in RAM. Or does it need be in RAM if is hashable (fast find) ? Anybody know? So for 10GBs of text it would work o-k, 50GBs of RAM needed then.

But don't forget you need to find ALL matches of "[we [walked [down [the]]]] ?___?", and combine the predictions to get a set of predicted words for all 4 matches, yup so 'the' has nearly 800,000 matches in 100MBs of wiki LOL, when they could all be put into a tree with max 50K vocab. You also need a semantic web like word2vec and need to store those embeds or connections.

So it's big....and slow...

I don't know where you got your numbers from but certainly not from an understanding of hashcodes. So, take a word that has four characters with simple ASCII that's 4 Bytes, but a 32-bit hashcode is only 4 bytes and the word like "pneumonoultramicroscopicsilicovolcanoconiosis" is 45 characters(45 Bytes) but the word is represented by other components which means there are even more bytes involved where each word is stored with its OL, all of that reduces to a 32-bit hashcode! The ontological component and the descriptor component provide feature or property states for each word and there is only one instance of those structures for each word. I have something like 790,000 words stored and the OL database and it's only 391MB, but its hashcode store is only 3.2MB with 32-bit and 6.4MB with 64-bit codes! My older server has 128GB and the new system has 256GB. The 391MB with the addition 3.2MB is but a drop in the bucket of all the ram I have! The descriptor component is just starting out but right now is averaging 20,000 bytes per word, at 790,000 words that's 16GB to cache it, but its hashcode per word reduces to just 4 to 8 bytes!

Ok, so you might argue; but you have to index those features as well, and you're right, the current descriptor DB feature index averages 881 bytes per word, so 790,000 words would be just 700MB, where each feature is a single instance with a HashSet that stores a reference to the descriptor component, again a drop in the bucket of all the ram I have! So, even as the data grows as the system learns and makes those descriptor components more complex there is plenty of room and the hashcode burden is trivial.

If you remember my post of the time chunking scheme where I ran out of 128GB in 32 minutes, but I later simply stored threshold deltas of stimuli which kept everything manageable for days, where, yes eventually you'll need to manage the temporal resources by writing to disk, in this case, is NVMe gen3 or 4 SSD. The approach makes things pretty responsive when having to find data on the disk, which is indexed with hashcodes and inter-file locations.

Now here's your problem with ANNs, you have to iterate through the entire network that doesn't really work in a way that can represent meaning as a point in memory. Your ANN distributes the description of words across the entire network which is why you have to iterate through the entire matrix to get an output. My approach doesn't, the data is focused into structures that have single instances that can even dynamically change in real-time, meaning the system can learn while it executes! As stimuli are entered into the system it is converted into hashcode sets that look up the relevant data that is associated with functions or processes to respond. So I don't have to iterate through the entire dataset as the Anns do, and can change the associations to those structures instantly, no retraining of the entire system. Also, remember accessing other data that relates to the descriptors is referenced whose instances reference other data. So, algorithmically I can gain access to data to provide more capabilities without having to randomly search for it since it's right there for the taking because of how relationships are linked/referenced, again speeding up processing and not having to iterate through billions of other neurodes that aren't really representative of what is need but you have to calculate their contribution to the output regardless.

With an Ann you can't just find the functional data points with a query, but with this approach, you can and that's why only a fraction of the computational horsepower is needed compared to an ANN. Here's another advantage, I can still use ANNs but they are much much smaller because they are focused on the semantic interpretations of a query that's initiated from stimuli, whose generalized states can be evaluated into patterns. Realize the ANN is called only after the data is matched to the stimuli. So the problem domain is much smaller than what GPT3 does which tries to encode everything into a big ANN. Also, this approach isn't trapped into an ANN solution only, so it opens up the framework to a universe of solutions, e.g. genetic algorithms, differential equations, Bayesian inference, etc.

Let me try again: You have to understand that when it comes to GPT and my AI, that if you want to attain the same level of results (unless you've found some more efficient way and implemented it and can show it works (others tell me to, so I say it back: show me code!)), then you need to store every ~4 word long strings in 40GBs of text, basically. This allows you to take a prompt like 'the>___' and predict the next word properly, knowing what word is usually the word that comes next. Blending methods like this brings All the magic, it is far from copying the dataset. But see, in 40GBs of text, 'the' appears lots, just 100MB has 800,000 occurrences. So you need to put those all into a trie tree, otherwise you'll be matching all them every time you are given that sentence to complete. So you may predict now: the > cat/ home/ arm/ light/ throw/ moon.....and maybe dog was seen more than moon, so you more heavily predict dog then.

frankinstien · « **Reply #28 on:** November 08, 2021, 06:49:05 am »

Quote

But see, in 40GBs of text, 'the' appears lots, just 100MB has 800,000 occurrences. So you need to put those all into a trie tree, otherwise, you'll be matching all of them every time you are given that sentence to complete. So you may predict now: the > cat/ home/ arm/ light/ throw/ moon.....and maybe dog was seen more than moon, so you more heavily predict dog then.

No, I don't use a trie. I have posted many times the descriptor concept on this forum, but I'll do it again to clarify some things. You appear to think that I have duplicate data into tiers for each sentence but I don't. The word "the" is a determiner and is handled by the NLP, it does have a representation in the OL but its state as a determiner suffices as a state in itself. While a word like "human" can appear many times in text there is only one instance of "human" as a descriptor and other classifications that are coded in an ontological framework. Since only one instance of a word is allowed in the database any contextual variances are considered concepts that are symbolically represented by a word or set of words as a term. Such concepts get associated with words through descriptor objects and OL hierarchies, as shown below:

Note: hover over an image and click on it to get a bigger image.

The OL will have the concept of human across many groupings or hierarchies but there is only one instance of Human that all those headings reference.

Now as I mentioned earlier under this thread; If something is already known then it doesn't have to be duplicated, so to remove redundant data inheritance or nesting is used, as shown below:

As you can see "Human" inherits from the concept of "Animal":

Animal has many other concepts such as Head, Neck, Torso, etc. Those concepts are nested into Animal, rather than inheriting from them, since those properties are parts of an animal, they need to be exposed as such. Those descriptors have vector definitions that are both numeric and text, where the enumerator is used in algorithms. Below is an image of a vector state of a word in a descriptor object.

Now, on my web site blog I have a write-up on whether to predict or react and many times the ability to react proves far better a strategy than predicting! Now, let's look at a sentence parsed for its grammar:

The parser groups the words into noun phrases, verb phrases, prepositional phrases, etc, and identify their parts of speech. That information is volatile, once the sentence, paragraph, or page has been evaluated by other logic it's disposed of! Well sort of, its episodic representation is stored as text along with the logical interpretation of it. The reason for that involves using memory as humans do where remembering events is actually re-evaluating with new perceptions. Now the descriptors and OL hierarchies along with the NLP's output help correlate what a sentence means reactively, not through some kind of stepwise prediction that wastes CPU cycles on wrong predictions. Because the NLP segments the sentence it's possible to compute relationships based on concepts that those words relate to. This also allows for the machine to learn by finding similarities and/or asking a mentor questions about a word or some segment of the sentence or the entire sentence.

OK, so now you should be able to see that the text is turned into meaningful vectors without having to apply an ANN and those sentences are not stored, now to find the correlations mention above, I could use an ANN, but it's a pared-down ANN where the OL hierarchy can constrain the problem domain and I apply that network as needed instead of needlessly computing neurodes that have very little contribution to the output but have to be computed none the less.

LOCKSUIT · « **Reply #29 on:** November 08, 2021, 06:43:19 pm »

But does it predict as good as GPT-1? / Show me it running. Ideas don't mean much unless you have a theory I can grasp "quickly in 1 post".

Yes, I see your post, Reflexes are great, but Prediction is greater and general as shown in GPT-3. I rarely use my primitive reflexes (~20 I was born with). And learning to walk won't solve cancer or think about 'a wide range of situations'. Capturing data skips 'body learning' and puts you into a Simulation mentally, which is safer and faster etc. It's way better than even a computer sim, because predicting like DALL-E is cheaper than running a real sim with fluid etc.

I see you have a database that uses nesting and inheritance. You could probably let an AI learn those for you instead of writing them all in. It'd have to be vision, and use word2vec or such. So far so good. But the biggest problem is I don't think it can predict the rest of an image or sentence like GPT/ Jukebox/ DALL-E can. I was given lengthy exactly_human_level techno completions from Jukebox.

The reason a trie tree/ network needs to store 'the', 'but the', 'and the', 'cat and the', 'move and the', 'wind move and the', etc, is because it is storing the different contexts in which 'the' appears in. The more words and their order in the context that matc, the better for prediction. It then uses the longest matches to retrieve a prediction of the next word, and combines shorter predictions when its long matches have few experiences (usually is the case) in what word Usually comes next. It also can use robustness to holes and delay ex. 12r45>6, 12r345>6, 1 2 3 4 5 > 6.

NARS usage Guide (short read)......isn't natural language input/output, but rather Narsese! Apparently they have or are thinking about a way to make it natural input/output, hence the whole Narsese (logic) thing must be just "GPT statistics" and not logic/rule based AI.
https://www.google.com/url?sa=j&url=https%3A%2F%2Fcis.temple.edu%2F~pwang%2FImplementation%2FNAL%2FNAL-Guide.html&uct=1593766665&usg=Aq4qY4OLx-WK_HAKqmOG4qyU35k.

If you think logic based AI like NARS makes sense because it tries to use less data/ resources and more intelligently look at the context with "constraints/ rules" to predict the next word, you are mistaking some things... NARS [may] have some good ideas, but they should be able to be added, and Should, to GPT. How? I am not sure what NARS does, but I have a feeling it needs humans to write in Properties of Things, Relationships, Verbs, etc, (in Narsese and not natural language), which is too much work to be practical, then it can say "I bought a ___" where it can be anything because when you say bought, it can be anything, but when you say "a snake ", it can't be anything really usually, "a snake bit me" is ok, "a snake gift sold" is a bit uncommon, "a snake car"....so bought> allows more possible things to follow, other words don't, some require even very similar matches look: A similar word to car is: truck/ van/ vehicle/ etc. A different word than car is: ---anything---, and so there is how that mechanism works. It can 'learn' this, and predict for unseen words that they too probably can go there if most other words do, or most other felines do.

My HAl Rig

frankinstien

Re: My HAl Rig

MagnusWootton

Re: My HAl Rig

LOCKSUIT

Re: My HAl Rig

frankinstien

Re: My HAl Rig

MagnusWootton

Re: My HAl Rig

frankinstien

Re: My HAl Rig

LOCKSUIT

Re: My HAl Rig

frankinstien

Re: My HAl Rig

MagnusWootton

Re: My HAl Rig

frankinstien

Re: My HAl Rig

MagnusWootton

Re: My HAl Rig

frankinstien

Re: My HAl Rig

LOCKSUIT

Re: My HAl Rig

frankinstien

Re: My HAl Rig

LOCKSUIT

Re: My HAl Rig

Recent Topics

Recent News

Users Online

Articles