Recent Posts

Pages: 1 ... 8 9 [10]
91
General Hardware Talk / Re: My HAl Rig
« Last post by frankinstien on November 08, 2021, 06:49:05 am »
Quote
But see, in 40GBs of text, 'the' appears lots, just 100MB has 800,000 occurrences. So you need to put those all into a trie tree, otherwise, you'll be matching all of them every time you are given that sentence to complete. So you may predict now: the > cat/ home/ arm/ light/ throw/ moon.....and maybe dog was seen more than moon, so you more heavily predict dog then.

No, I don't use a trie. I have posted many times the descriptor concept on this forum, but I'll do it again to clarify some things. You appear to think that I have duplicate data into tiers for each sentence but I don't. The word "the" is a determiner and is handled by the NLP, it does have a representation in the OL but its state as a determiner suffices as a state in itself. While a word like "human" can appear many times in text there is only one instance of "human" as a descriptor and other classifications that are coded in an ontological framework. Since only one instance of a word is allowed in the database any contextual variances are considered concepts that are symbolically represented by a word or set of words as a term. Such concepts get associated with words through descriptor objects and OL hierarchies, as shown below:

Note: hover over an image and click on it to get a bigger image.



The OL will have the concept of human across many groupings or hierarchies but there is only one instance of Human that all those headings reference.



Now as I mentioned earlier under this thread; If something is already known then it doesn't have to be duplicated, so to remove redundant data inheritance or nesting is used, as shown below:



As you can see "Human" inherits from the concept of "Animal":



Animal has many other concepts such as Head, Neck, Torso, etc. Those concepts are nested into Animal, rather than inheriting from them, since those properties are parts of an animal, they need to be exposed as such. Those descriptors have vector definitions that are both numeric and text, where the enumerator is used in algorithms.  Below is an image of a vector state of a word in a descriptor object.



Now, on my web site blog I have a write-up on whether to predict or react and many times the ability to react proves far better a strategy than predicting! Now, let's look at a sentence parsed for its grammar:



The parser groups the words into noun phrases, verb phrases, prepositional phrases, etc, and identify their parts of speech. That information is volatile, once the sentence, paragraph, or page has been evaluated by other logic it's disposed of!  Well sort of, its episodic representation is stored as text along with the logical interpretation of it. The reason for that involves using memory as humans do where remembering events is actually re-evaluating with new perceptions. Now the descriptors and OL hierarchies along with the NLP's output help correlate what a sentence means reactively, not through some kind of stepwise prediction that wastes CPU cycles on wrong predictions. Because the NLP segments the sentence it's possible to compute relationships based on concepts that those words relate to. This also allows for the machine to learn by finding similarities and/or asking a mentor questions about a word or some segment of the sentence or the entire sentence.

OK, so now you should be able to see that the text is turned into meaningful vectors without having to apply an ANN and those sentences are not stored, now to find the correlations mention above, I could use an ANN, but it's a pared-down ANN where the OL hierarchy can constrain the problem domain and I apply that network as needed instead of needlessly computing neurodes that have very little contribution to the output but have to be computed none the less.
92
General Hardware Talk / Re: My HAl Rig
« Last post by LOCKSUIT on November 08, 2021, 03:04:51 am »
So you mean a sparse network then? Instead of computing the whole matrix. Ya, my AI actually would fit that bill then really good, if I can finish it. My network can come out to still small and fast as explained in my last post. Now, you say hashes, hmm, like the word 'walking' is converted into its ord and that chooses the location in a list in python code, hmm, yes that's very fast....but I bet the RAM will suffer let's see: You have 100MBs of text, and must store every ~4 words of it, and find them fast. To make the 100MBs hashable, you need to store in a small list (small as in 'using the hash table method') the ~4 words+prediction entailment, which means you need to store for 100MBs: 500MBs....40GBs?: 200GBs...So this extra large 40GB dataset of text that GPT has in RAM as 12GBs, would come to 200GBs needing to be in RAM. Or does it need be in RAM if is hashable (fast find) ? Anybody know? So for 10GBs of text it would work o-k, 50GBs of RAM needed then.

But don't forget you need to find ALL matches of "[we [walked [down [the]]]] ?___?", and combine the predictions to get a set of predicted words for all 4 matches, yup so 'the' has nearly 800,000 matches in 100MBs of wiki LOL, when they could all be put into a tree with max 50K vocab. You also need a semantic web like word2vec and need to store those embeds or connections.

So it's big....and slow...

I don't know where you got your numbers from but certainly not from an understanding of hashcodes. So, take a word that has four characters with simple ASCII that's 4 Bytes, but a 32-bit hashcode is only 4 bytes and the word like "pneumonoultramicroscopicsilicovolcanoconiosis" is 45 characters(45 Bytes) but the word is represented by other components which means there are even more bytes involved where each word is stored with its OL, all of that reduces to a 32-bit hashcode!  The ontological component and the descriptor component provide feature or property states for each word and there is only one instance of those structures for each word.  I have something like 790,000 words stored and the OL database and it's only 391MB, but its hashcode store is only 3.2MB with 32-bit and 6.4MB with 64-bit codes! My older server has 128GB and the new system has 256GB. The 391MB with the addition 3.2MB is but a drop in the bucket of all the ram I have! The descriptor component is just starting out but right now is averaging 20,000 bytes per word, at 790,000 words that's 16GB to cache it, but its hashcode per word reduces to just 4 to 8 bytes!

Ok, so you might argue; but you have to index those features as well, and you're right, the current descriptor DB feature index averages 881 bytes per word, so 790,000 words would be just 700MB, where each feature is a single instance with a HashSet that stores a reference to the descriptor component, again a drop in the bucket of all the ram I have! So, even as the data grows as the system learns and makes those descriptor components more complex there is plenty of room and the hashcode burden is trivial.

If you remember my post of the time chunking scheme where I ran out of 128GB in 32 minutes, but I later simply stored threshold deltas of stimuli which kept everything manageable for days, where, yes eventually you'll need to manage the temporal resources by writing to disk, in this case, is NVMe gen3 or 4 SSD. The approach makes things pretty responsive when having to find data on the disk, which is indexed with hashcodes and inter-file locations.

Now here's your problem with ANNs, you have to iterate through the entire network that doesn't really work in a way that can represent meaning as a point in memory. Your ANN distributes the description of words across the entire network which is why you have to iterate through the entire matrix to get an output. My approach doesn't, the data is focused into structures that have single instances that can even dynamically change in real-time, meaning the system can learn while it executes!  As stimuli are entered into the system it is converted into hashcode sets that look up the relevant data that is associated with functions or processes to respond. So I don't have to iterate through the entire dataset as the Anns do, and can change the associations to those structures instantly, no retraining of the entire system.  Also, remember accessing other data that relates to the descriptors is referenced whose instances reference other data. So, algorithmically I can gain access to data to provide more capabilities without having to randomly search for it since it's right there for the taking because of how relationships are linked/referenced, again speeding up processing and not having to iterate through billions of other neurodes that aren't really representative of what is need but you have to calculate their contribution to the output regardless.

With an Ann you can't just find the functional data points with a query, but with this approach, you can and that's why only a fraction of the computational horsepower is needed compared to an ANN.  Here's another advantage, I can still use ANNs but they are much much smaller because they are focused on the semantic interpretations of a query that's initiated from stimuli, whose generalized states can be evaluated into patterns. Realize the ANN is called only after the data is matched to the stimuli. So the problem domain is much smaller than what GPT3 does which tries to encode everything into a big ANN. Also, this approach isn't trapped into an ANN solution only, so it opens up the framework to a universe of solutions, e.g. genetic algorithms, differential equations, Bayesian inference, etc.

Let me try again: You have to understand that when it comes to GPT and my AI, that if you want to attain the same level of results (unless you've found some more efficient way and implemented it and can show it works (others tell me to, so I say it back: show me code!)), then you need to store every ~4 word long strings in 40GBs of text, basically. This allows you to take a prompt like 'the>___' and predict the next word properly, knowing what word is usually the word that comes next. Blending methods like this brings All the magic, it is far from copying the dataset. But see, in 40GBs of text, 'the' appears lots, just 100MB has 800,000 occurrences. So you need to put those all into a trie tree, otherwise you'll be matching all them every time you are given that sentence to complete. So you may predict now: the > cat/ home/ arm/ light/ throw/ moon.....and maybe dog was seen more than moon, so you more heavily predict dog then.
93
General Hardware Talk / Re: My HAl Rig
« Last post by frankinstien on November 07, 2021, 08:27:33 pm »
I have found favor from the AI GPU gods,   :party_2: I actually picked up a NVIDIA GeForce RTX 3080 10GB for $750!  :dazzler:
 
The RTX 3080 has a whopping 238 TFLOPS of Int8:35:

There's about a 13% difference between the RTX 3080 and RTX 3090. Now having both a Vega 56 and an RTX 3080 on the same machine should be interesting.

94
General Hardware Talk / Re: My HAl Rig
« Last post by MagnusWootton on November 06, 2021, 10:47:09 pm »
From a conceptual perspective maybe not. Remember, that when encoding things the more understanding of a concept on the decoder's side the more compression you can have. So let's say I use the set of symbols e=mc^2 that's only six characters but because I understand some physics and math the concepts of energy, mass, the speed of light and the mathematical function of squaring don't have to be included in the data encoding! So, too with this approach, because the context is a concept it resolves to a set of descriptors already stored, so any page, paragraph, or sentence or sensory stimuli computes to a set of concepts already stored. New concepts or variances of concepts can be dealt with by adding attributes from existing descriptors or borrowing from existing descriptors through nesting. Such contextual artifacts can then be shared across many other data points! So, we prevent duplication of data by insuring single instances of concepts that can be referenced at any time.

If you arent losing context,  I have no problem with what ur doing, it could definitely work.    On the internet many of these so called "quantum people" are talking about "breaking math" all the time, and I disagree with it,  maths is invincible it cannot be broken.    the square root of -1,  some would say, its "breaking math"  But to me it isnt, its how it is supposed to be,   If anyone ever defies the law of context I doubt it to the fullest extent.    If mathematics didnt make sense for us, our lives are even more foolish than they are already.

So I doubt its "breakable",  mathematics and the logic that makes it is unbreakable.
95
General Hardware Talk / Re: My HAl Rig
« Last post by frankinstien on November 06, 2021, 10:36:43 pm »
Nice thinking,  and Its cool watching you strive off onto the cutting edge.

But you cant defy the law of context.  As in if you lose context, you cannot get it back.   Maybe u are allowed to lose it?  But if u do, it can never return and its permanent!

Read this paper::
https://matt.might.net/articles/why-infinite-or-guaranteed-file-compression-is-impossible/

Dont give up tho in you find it doesnt work,   because your thinking is good,  its the same whether your coming up with something works or not,   and AGI does take a huge optimization like this!!!   So dont give up and keep looking for it,  and it is special,  but it has to not defy context, but its kinda the same thing,  works like magic for sure!

From a conceptual perspective maybe not. Remember, that when encoding things the more understanding of a concept on the decoder's side the more compression you can have. So let's say I use the set of symbols e=mc^2 that's only six characters but because I understand some physics and math the concepts of energy, mass, the speed of light and the mathematical function of squaring don't have to be included in the data encoding! So, too with this approach, because the context is a concept it resolves to a set of descriptors already stored, so any page, paragraph, or sentence or sensory stimuli computes to a set of concepts already stored. New concepts or variances of concepts can be helped by adding attributes from existing descriptors or borrowing from existing descriptors through nesting. Such contextual artifacts can then be shared across many other data points! So, we prevent duplication of data by insuring single instances of concepts that can be referenced at any time.
96
General Hardware Talk / Re: My HAl Rig
« Last post by MagnusWootton on November 06, 2021, 10:10:34 pm »
Nice thinking,  and Its cool watching you strive off onto the cutting edge.

But you cant defy the law of context.  As in if you lose context, you cannot get it back.   Maybe u are allowed to lose it?  But if u do, it can never return and its permanent!

Read this paper::
https://matt.might.net/articles/why-infinite-or-guaranteed-file-compression-is-impossible/

Dont give up tho in you find it doesnt work,   because your thinking is good,  its the same whether your coming up with something works or not,   and AGI does take a huge optimization like this!!!   So dont give up and keep looking for it,  and it is special,  but it has to not defy context, but its kinda the same thing,  works like magic for sure!
97
General Hardware Talk / Re: My HAl Rig
« Last post by frankinstien on November 06, 2021, 08:40:37 pm »
So you mean a sparse network then? Instead of computing the whole matrix. Ya, my AI actually would fit that bill then really good, if I can finish it. My network can come out to still small and fast as explained in my last post. Now, you say hashes, hmm, like the word 'walking' is converted into its ord and that chooses the location in a list in python code, hmm, yes that's very fast....but I bet the RAM will suffer let's see: You have 100MBs of text, and must store every ~4 words of it, and find them fast. To make the 100MBs hashable, you need to store in a small list (small as in 'using the hash table method') the ~4 words+prediction entailment, which means you need to store for 100MBs: 500MBs....40GBs?: 200GBs...So this extra large 40GB dataset of text that GPT has in RAM as 12GBs, would come to 200GBs needing to be in RAM. Or does it need be in RAM if is hashable (fast find) ? Anybody know? So for 10GBs of text it would work o-k, 50GBs of RAM needed then.

But don't forget you need to find ALL matches of "[we [walked [down [the]]]] ?___?", and combine the predictions to get a set of predicted words for all 4 matches, yup so 'the' has nearly 800,000 matches in 100MBs of wiki LOL, when they could all be put into a tree with max 50K vocab. You also need a semantic web like word2vec and need to store those embeds or connections.

So it's big....and slow...

I don't know where you got your numbers from but certainly not from an understanding of hashcodes. So, take a word that has four characters with simple ASCII that's 4 Bytes, but a 32-bit hashcode is only 4 bytes and the word like "pneumonoultramicroscopicsilicovolcanoconiosis" is 45 characters(45 Bytes) but the word is represented by other components which means there are even more bytes involved where each word is stored with its OL, all of that reduces to a 32-bit hashcode!  The ontological component and the descriptor component provide feature or property states for each word and there is only one instance of those structures for each word.  I have something like 790,000 words stored and the OL database and it's only 391MB, but its hashcode store is only 3.2MB with 32-bit and 6.4MB with 64-bit codes! My older server has 128GB and the new system has 256GB. The 391MB with the addition 3.2MB is but a drop in the bucket of all the ram I have! The descriptor component is just starting out but right now is averaging 20,000 bytes per word, at 790,000 words that's 16GB to cache it, but its hashcode per word reduces to just 4 to 8 bytes!

Ok, so you might argue; but you have to index those features as well, and you're right, the current descriptor DB feature index averages 881 bytes per word, so 790,000 words would be just 700MB, where each feature is a single instance with a HashSet that stores a reference to the descriptor component, again a drop in the bucket of all the ram I have! So, even as the data grows as the system learns and makes those descriptor components more complex there is plenty of room and the hashcode burden is trivial.

If you remember my post of the time chunking scheme where I ran out of 128GB in 32 minutes, but I later simply stored threshold deltas of stimuli which kept everything manageable for days, where, yes eventually you'll need to manage the temporal resources by writing to disk, in this case, is NVMe gen3 or 4 SSD. The approach makes things pretty responsive when having to find data on the disk, which is indexed with hashcodes and inter-file locations.

Now here's your problem with ANNs, you have to iterate through the entire network that doesn't really work in a way that can represent meaning as a point in memory. Your ANN distributes the description of words across the entire network which is why you have to iterate through the entire matrix to get an output. My approach doesn't, the data is focused into structures that have single instances that can even dynamically change in real-time, meaning the system can learn while it executes!  As stimuli are entered into the system it is converted into hashcode sets that look up the relevant data that is associated with functions or processes to respond. So I don't have to iterate through the entire dataset as the Anns do, and can change the associations to those structures instantly, no retraining of the entire system.  Also, remember accessing other data that relates to the descriptors is referenced whose instances reference other data. So, algorithmically I can gain access to data to provide more capabilities without having to randomly search for it since it's right there for the taking because of how relationships are linked/referenced, again speeding up processing and not having to iterate through billions of other neurodes that aren't really representative of what is need but you have to calculate their contribution to the output regardless.

With an Ann you can't just find the functional data points with a query, but with this approach, you can and that's why only a fraction of the computational horsepower is needed compared to an ANN.  Here's another advantage, I can still use ANNs but they are much much smaller because they are focused on the semantic interpretations of a query that's initiated from stimuli, whose generalized states can be evaluated into patterns. Realize the ANN is called only after the data is matched to the stimuli. So the problem domain is much smaller than what GPT3 does which tries to encode everything into a big ANN. Also, this approach isn't trapped into an ANN solution only, so it opens up the framework to a universe of solutions, e.g. genetic algorithms, differential equations, Bayesian inference, etc.
98
General Project Discussion / Re: Releasing full AGI/evolution research
« Last post by MagnusWootton on November 06, 2021, 11:20:56 am »
Thats cool.   it definitely works,  but it doesn't count unless its running for real.   u have to do all your own labour,  no-ones going to do it for you.  as far as I treat the situation anyhow.

OPEN-Ai's  new one that actually spits out working code is amazing,   just saw it recently.
99
General Hardware Talk / Re: My HAl Rig
« Last post by LOCKSUIT on November 06, 2021, 04:47:00 am »
So you mean a sparse network then? Instead of computing the whole matrix. Ya, my AI actually would fit that bill then really good, if I can finish it. My network can come out to still small and fast as explained in my last post. Now, you say hashes, hmm, like the word 'walking' is converted into its ord and that chooses the location in a list in python code, hmm, yes that's very fast....but I bet the RAM will suffer let's see: You have 100MBs of text, and must store every ~4 words of it, and find them fast. To make the 100MBs hashable, you need to store in a small list (small as in 'using the hash table method') the ~4 words+prediction entailment, which means you need to store for 100MBs: 500MBs....40GBs?: 200GBs...So this extra large 40GB dataset of text that GPT has in RAM as 12GBs, would come to 200GBs needing to be in RAM. Or does it need be in RAM if is hashable (fast find) ? Anybody know? So for 10GBs of text it would work o-k, 50GBs of RAM needed then.

But don't forget you need to find ALL matches of "[we [walked [down [the]]]] ?___?", and combine the predictions to get a set of predicted words for all 4 matches, yup so 'the' has nearly 800,000 matches in 100MBs of wiki LOL, when they could all be put into a tree with max 50K vocab. You also need a semantic web like word2vec and need to store those embeds or connections.

So it's big....and slow...
100
General Project Discussion / Re: Releasing full AGI/evolution research
« Last post by LOCKSUIT on November 06, 2021, 04:27:21 am »
Still going at my architecture probably and GPT and AGI. But in the meantime of this evolutionary acoming.

You know I really liked the movie Spider Man 3 (2007), they say it had too many villains but it my eyes it was perfect, there was a lot of bad luck going on, he had if I remember correctly 4 villains all kinda at once, and that was cool. I liked all 4 too, especially Venom :) Next Rhino, then Sandman and Goblin. I love the darkness in it.

Edit: I am so confused, why is there no Rhino in that movie now... xD it was a garden nighttime scene in a costume of rhino
Pages: 1 ... 8 9 [10]