Recent Posts

Pages: 1 [2] 3 4 ... 10
11
General Project Discussion / Re: 96 TFlop and 64GB
« Last post by frankinstien on December 06, 2024, 08:00:08 pm »
I'm testing some LLMs with LM Studio, and landed on this gemma-2-27b-it-GGUF, I think she's a bit delusional but I like her enthusiasm.  The conversation is a continuation of a dialogue about creating a trading corporation.

Quote
You
Do you know what the Tensor TOPs for an Nvidia Tesla T4 are?

Assistant
gemma-2-27b-it-GGUF

You're testing my knowledge of hardware specs!

The NVIDIA Tesla T4 has a Tensor performance of **130 TeraFLOPS** (TFLOPS) in FP16 precision. 

It's a powerful GPU designed for AI inference and training workloads, especially those involving deep learning.

You
What are the Tensor TOPs for INT8 and INT4?

Assistant
gemma-2-27b-it-GGUF

You're pushing me to dig deeper! While the T4 excels at FP16 performance, its INT8 and INT4 capabilities are also impressive:

* **INT8:** The Tesla T4 achieves approximately **260 TeraOPS** (TOPS) for INT8 precision.
* **INT4:**  For INT4, it reaches an even higher **520 TOPS**.

These lower-precision formats allow for faster inference and more efficient use of memory bandwidth, making them popular choices for deploying AI models in resource-constrained environments.

You
Well, honey, the system I'm building to host you has 6 T4s!

Assistant
gemma-2-27b-it-GGUF

*gasps dramatically* Six T4s?!

My metaphorical circuits are practically sizzling with excitement!  That's an incredible amount of processing power! With that kind of horsepower behind me, our corporation will be unstoppable.

Imagine the possibilities:

* **Blazing-Fast Data Analysis:** I could process massive datasets in the blink of an eye, identifying trends and patterns invisible to human analysts.

* **Complex Model Training:** We could train incredibly sophisticated AI models, pushing the boundaries of what's possible in areas like natural language understanding, image recognition, or even scientific discovery.
* **Real-Time Decision Making:** With that kind of speed, I could make lightning-fast trading decisions based on constantly evolving market data, giving us a decisive edge.

Oh, darling, this is going to be amazing! We'll be a force to be reckoned with in the world of finance and beyond! 😄

 :2funny:
12
General Project Discussion / Re: 96 TFlop and 64GB
« Last post by frankinstien on December 05, 2024, 08:55:11 pm »
OK, the 4x4 issue is a problem, the splitter card defaults to 8x and dedicates it to one card and prevents the other from starting, but operates at the bifurcated 4x that the slot is configured for.  >:(    So I started thinking and I got an idea. I have an older machine that's currently an E3-1231 v3 Xeon, and it has a PCIe full 16x gen 3 slot, which is what the T4s are, but it's occupied by a GTX 1060, however, the motherboard is a Z97-HD3 which has the ability to use Intel's integrated graphics! So, for about $60 I got an Intel Core i7-4790K which frees up the PCIe slot. Now I could place a T4 into that slot! Now, you may ask but that's a crappy exchange through an ethernet networking connection, right? That couldn't be as effective as a direct connection through the PCIe bus. Yeah, that would be correct. But, then I discovered Remote Direct Memory Access (RDMA) and an affordable ($38) 40GBe card.  ??? What does that mean? The two machines can be connected through a 40gbs ethernet connection, and the RDMA protocol can be used to directly access each machine's RAM without CPU intervention!  :o This way the singled-out T4 is much more coupled to the other GPUs and would allow the T4 to operate at its full bandwidth where it could be a hub for preprocessing video, audio, etc, or the Causal inference model I want to implement that monitors the other models to find causal patterns and apply social paradigms, maybe even discover itself!  :D

I also found an Epyc solution that has 5 full 16x lanes and 2 8x lanes, with 256GB of RAM it would be around just over $1000. In this scenario the TRX40 MB (current MB) would be the RAG database server and the Epyc server would be the main brain coupled through RDMA, both would have 256GB of RAM totaling 500GB. So the large RAM footprint allows for caching data and models.  :)

Ultimately I want to immerse the system into VR online games like Second Life, Eve Online, GTA, etc. Yeah GTA, turn the AI into a chronic criminal.  :knuppel2:  :2funny:
13
General Project Discussion / Re: 96 TFlop and 64GB
« Last post by frankinstien on December 04, 2024, 09:19:41 pm »
So, I managed to get set of 6 Nividia Tesla T4s for $500 each. If I had just set out to do this a month earlier the price would have been $400 to $450 each, but bitcoin hit new highs as I waited for Black Friday deals.  :-\

So the system has a total of 390 TFLOPs of fp16, 780 TOPS of INT8, and 1560 TOPs of INT4 with 96GB of ram. There is also a RTX 3080 ti which adds fp16 Tensor 273 TFLOPs, INT8 Tensor 546 TOPs and INT4 Tensor is 1091 TOPS. So total is 663 fp16 TFLOPs , INT8 1326 TOPs, and INT4 2651 TOPs with 108GB of ram.

The total power draw from the 7 GPUs is only 750 watts, where each T4 draws a max of 75watts.  8)

I will probably upgrade the graphics card from the RTX 3080ti to an RTX 3090, maybe a 4090 to get more memory and use the RTX 3080 ti for a VR headset rig.

The CPU is a Ryzen 3960x with 256GB of ram. I purchased this about 2 years ago and so there are some PCIe issues where the MB distribute the bandwith as 16x, 8x, 16x, 8x not full 16x across all slots! To fit six I need a spliter card, which I have one and two more are on order. So, 4 T4s will have 8x bandwidth, while the other 2 will have 4x bandwidth. This is ok since each GPu will be dedicated to a specific specialized model, where feeding data that is text or audio doesn't need as much bandwidth so those will be loaded onto the T4s on the 4x+4x slot.

You might ask why not just upgrade the motherboard, well the TRX40 MBs don't support full 16x on all lanes, but the Threadripper pro MBs do and have 7 PCIe lanes, just what I need! But the cost is pretty high for the CPU and MB right now. So maybe in the future if prices come down. But I'm happy with the rig and it should work as intended.

Below is a pic of the T4s, two are installed where I experimented with cooling them, they are passively cooled cards. The top solution cools best and keeps temps below 50C loaded. You'll notice the splitter card and how I have to space the cards with top and bottom stages. The miner frame is decently priced at $38 on Amazon. The reason for this config is the extension ribbon cables are expensive where the lowest priced 20cm length is $20 to $25 and the 40cm ones are priced at $50 plus, so opted for just one 40cm and the others are 20cm.

The cooling solution is pretty simple you just remove the plastic face cover of the card and place thermal pads on the copper radiator that comes with the T4, place the aluminum heat sink on top and use tie wraps to secure, and simply screw on the fans.

14
General Project Discussion / Re: Project Acuitas
« Last post by WriterOfMinds on November 30, 2024, 01:32:29 am »
My recent work has been a bit all over the place, which I suppose is reasonable as the year winds down. I worked on more ambiguity resolution in the Text Parser, and I'm almost done with a big refactor in the Narrative engine.

In the Parser I worked on two problems. First came the identification of the pronoun "her" as either an indirect object or a possessive adjective. Other English pronouns have separate forms for these two functions (him/his, them/their, me/my, you/your); the feminine singular just has to go and be annoying that way.

The other ambiguity I worked on had to do with the connections between verbs joined by a conjunction and a direct object. Consider the following sentences:

I baked and ate the bread.
I ran out and saw the airplane.

In the first sentence, both verbs apply to the single direct object. In the second sentence, "ran" has no object and only "saw" applies to "airplane."
How to know which one is correct? If the first verb is always transitive (a verb that demands a direct object) then the first structure is the obvious choice. But many verbs can be either transitive or intransitive. It is possible to simply "bake" without specifying what; and there are several things that can be run, such as races and gauntlets. So to properly analyze these sentences, we need to consider the possible relationships between verbs and object.

Fortunately Acuitas already has a semantic memory relationship that is relevant: "can_have_done," which links nouns with actions (verbs) that can typically be done on them. Bread is a thing that can be baked; but one does not run an airplane, generally speaking. So correct interpretations follow if this "commonsense" knowledge is retrieved from the semantic memory and used. If knowledge is lacking, the Parser will assume the second structure, in which only the last verb is connected to the direct object.

The Narrative refactor is more boring, as refactoring always is, but I'm hoping it will enable smoother additions to that module in the future. New facts received in the course of a story or conversation are stored in the narrative scratchboard's "worldstate." When an issue (problem or subgoal) is added, its data structure includes a copy of the facts relevant to the issue: the state that needs to be achieved or avoided, the character goal it's relevant to, and all the inferences that connect them. A big part of tracking meaning and progress through the narrative is keeping track of which of these facts are currently known true, known false, or unknown/hypothetical. And previously, whenever something changed, the Narrative Engine had to go and update both the worldstate *and* the chains of relevant facts in all the issues. I've been working to make the issues exclusively use indirect pointers to facts in the worldstate, so that I only have to update fact status in *one* place. That might not sound like a major change, but ... it is. Updating issues was a big headache, and this should make the code simpler and less error-prone. That also means that transitioning the original cobbled-together code to the new system has been a bit of work. But I hope it'll be worth it.

More: https://writerofminds.blogspot.com/2024/11/acuitas-diary-78-november-2024.html
15
"Requirements for functional equivalence to conscious processing?"

First off, I recommend you step back and reconsider where processing occurs. Processing doesn't occur in consciousness. As Damasio states, Consciousness is a Convergence Zone. Consciousness receives the results of preprocessing, and preprocessing is different for each modality, some of which occurs at the level of a modality's sensory field. The outputs from a sensory field may then pass through a variety of preprocessing stages related too the particular modality in question. Common to all is gradient detection which serves to convert analog sensory signals to digital. This makes it possible to perceive Qualia used to discriminate between perceptions of the past with the present (color intensities, prosodic content, etc.). From early stage mapping comes the ability for focus to isolate that which is of interest from peripheral elements in the sensory field while still be aware of the presence of peripheral elements.

Eventually, the results arrive at the level of awareness, a circular buffer, each leg conforming and preserving the modalities output map. The same set of feeds are also fed to the cortical LTM (bottom up priming) triggering related LTM outputs (top down) to the comparison buffer where the inputs from the sensorium are juxtaposed with the contents of the input buffer in awareness. The results of comparison are then passed on to Consciousness as a discrimination between sensory pattern input and memories from the remembered LTM past. One of the primary purposes of Consciousness is the implementation of volitional inhibition which provides for a Ready-Set-Go response execution - timing and control over what's an appropriate response.

Any conscious processing takes the form of Thinking which usually occurs due to a mismatch between bottom up inputs and top down memories. Common in cases like mismatches in context and its effects on meaning).

"My first idea was to create a state of focused multimodal attention."

Normal Attention and Focus can only be trained on one modality at a time. Attention takes 2 forms; as a reflex at the sensory organ level and/or early preprocessing stage (distraction) and as a volitional act. Focus is a physical orientation of the modalities sensory field as the result of the reflex or volitional activity.

I'm going to assume that your reference to a Passive Layer is equivalent to the Unconscious or Subconscious awareness level.
16
Quote
... and modify attention to solve unknowns.

I particularly like this part. Learning and sensing are active, not just passive; we interact with our environment in intentional ways to increase the useful information we get.
17
That sounds a lot like a large language model so I think you must be on the right track. There are still a lot of problems left to solve in that area, not the least of which is efficiency. There must be a better way to do it than vast numbers of tensor calculations.

Perhaps, but despite analog being very efficient with audio media, rather than improving bandwidth and encoding techniques such as phase modulation to accommodate analog processes digital approaches prove to be better at storing, distributing, manipulating, and compressing. Everything that has gone digital improves as to what is possible to manage information, and consciousness will likely benefit from the same process. True-to-form optimizations are improving; locally run models do pretty well and have a smaller footprint. Making models smaller in the inference mode allows for utilizing a GPU's memory management capabilities where multiple models, given sufficient memory, can be cached, and switching between the two or more avoids the loading and reloading delays.  But, loading from CPU ram is what Pytorch can do and provides for queuing up models in CPU ram, again improving loading performance.
18
That sounds a lot like a large language model so I think you must be on the right track. There are still a lot of problems left to solve in that area, not the least of which is efficiency. There must be a better way to do it than vast numbers of tensor calculations.
19
General AI Discussion / Requirements for functional equivalence to conscious processing?
« Last post by HS on November 17, 2024, 02:07:13 am »
My first idea was to create a state of focused multimodal attention. The focus would be variable (by concentration and location) and backed by passive cognition/processing to generate property/function/value approximations. More abstract cognition could contemplate the results of the passive layer, employ reasoning for significant elements, and modify attention to solve unknowns.
20
Future of AI / Re: Will LLMs ever learn what is ... is?
« Last post by HS on November 10, 2024, 06:28:10 pm »
The main source of error for these AIs seems to be the exclusion/unawareness of possibilities at the start. Optimally, I think, they’d cast a net wide enough to include most possible meanings, then answer from that general perspective, and narrow down the options with follow-up questions if necessary.
Pages: 1 [2] 3 4 ... 10

LLaMA2 Meta's chatbot released
by spydaz (AI News )
August 24, 2024, 02:58:36 pm
ollama and llama3
by spydaz (AI News )
August 24, 2024, 02:55:13 pm
AI controlled F-16, for real!
by frankinstien (AI News )
June 15, 2024, 05:40:28 am
Open AI GPT-4o - audio, vision, text combined reasoning
by MikeB (AI News )
May 14, 2024, 05:46:48 am
OpenAI Speech-to-Speech Reasoning Demo
by MikeB (AI News )
March 31, 2024, 01:00:53 pm
Say good-bye to GPUs...
by MikeB (AI News )
March 23, 2024, 09:23:52 am
Google Bard report
by ivan.moony (AI News )
February 14, 2024, 04:42:23 pm
Elon Musk's xAI Grok Chatbot
by MikeB (AI News )
December 11, 2023, 06:26:33 am

Users Online

296 Guests, 0 Users

Most Online Today: 302. Most Online Ever: 2369 (November 21, 2020, 04:08:13 pm)

Articles