96 TFlop and 64GB

frankinstien · « **on:** June 15, 2023, 11:39:59 pm »

The fact that you can get a GPU for free is amazing but the horse power is pretty limited and the better GPUs are very expensive to rent let alone buy and hard to reserve. Particularly if you have to train a model for a month! I have an old server with dual Xeons and 128GB of ram and it's just sitting there doing nothing. I recently priced the Instinct MI25s that boast a 24 TFlops F16 with 16GB for $80! If I purchase 4 of those I'd have 96TFlops and 64GB of ram. Of course the model's architectures would have to be MoE(Mixture of Experts) and the routing done by the CPU end. But in comparison to an A100 40GB on eBay at $7500 to $14,000 to a $320 expense I'm just about to pull the trigger.

The problem however is power, the 4 Instinct MI25 GPUs would draw 300 watts each. However...I do have some 2500 watts in solar panels hanging around.

Of course I could buy more GPUs, say double it which amounts to $640 that then leads to 192 TFlops F16 and 128GB of gpu ram, with power draw of 2400 watts!

WriterOfMinds · « **Reply #1 on:** June 16, 2023, 02:22:09 pm »

Quote from: frankinstien on June 15, 2023, 11:39:59 pm

The fact that you can get a GPU for free is amazing but the horse power is pretty limited

Where can one get a GPU for free, I'm intrigued. Are you just referring to the limited free trial/low-use plans of cloud services?

frankinstien · « **Reply #2 on:** June 17, 2023, 02:34:46 am »

Google Collab...

MagnusWootton · « **Reply #3 on:** July 03, 2023, 02:31:23 pm »

my gpu on my comp is overheating bad, cant do any largish workloads with it, better not get knockoff video cards they have defects.

frankinstien · « **Reply #4 on:** December 04, 2024, 09:19:41 pm »

So, I managed to get set of 6 Nividia Tesla T4s for $500 each. If I had just set out to do this a month earlier the price would have been $400 to $450 each, but bitcoin hit new highs as I waited for Black Friday deals. $:-\$

So the system has a total of 390 TFLOPs of fp16, 780 TOPS of INT8, and 1560 TOPs of INT4 with 96GB of ram. There is also a RTX 3080 ti which adds fp16 Tensor 273 TFLOPs, INT8 Tensor 546 TOPs and INT4 Tensor is 1091 TOPS. So total is 663 fp16 TFLOPs , INT8 1326 TOPs, and INT4 2651 TOPs with 108GB of ram.

The total power draw from the 7 GPUs is only 750 watts, where each T4 draws a max of 75watts.

I will probably upgrade the graphics card from the RTX 3080ti to an RTX 3090, maybe a 4090 to get more memory and use the RTX 3080 ti for a VR headset rig.

The CPU is a Ryzen 3960x with 256GB of ram. I purchased this about 2 years ago and so there are some PCIe issues where the MB distribute the bandwith as 16x, 8x, 16x, 8x not full 16x across all slots! To fit six I need a spliter card, which I have one and two more are on order. So, 4 T4s will have 8x bandwidth, while the other 2 will have 4x bandwidth. This is ok since each GPu will be dedicated to a specific specialized model, where feeding data that is text or audio doesn't need as much bandwidth so those will be loaded onto the T4s on the 4x+4x slot.

You might ask why not just upgrade the motherboard, well the TRX40 MBs don't support full 16x on all lanes, but the Threadripper pro MBs do and have 7 PCIe lanes, just what I need! But the cost is pretty high for the CPU and MB right now. So maybe in the future if prices come down. But I'm happy with the rig and it should work as intended.

Below is a pic of the T4s, two are installed where I experimented with cooling them, they are passively cooled cards. The top solution cools best and keeps temps below 50C loaded. You'll notice the splitter card and how I have to space the cards with top and bottom stages. The miner frame is decently priced at $38 on Amazon. The reason for this config is the extension ribbon cables are expensive where the lowest priced 20cm length is $20 to $25 and the 40cm ones are priced at $50 plus, so opted for just one 40cm and the others are 20cm.

The cooling solution is pretty simple you just remove the plastic face cover of the card and place thermal pads on the copper radiator that comes with the T4, place the aluminum heat sink on top and use tie wraps to secure, and simply screw on the fans.

frankinstien · « **Reply #5 on:** December 05, 2024, 08:55:11 pm »

OK, the 4x4 issue is a problem, the splitter card defaults to 8x and dedicates it to one card and prevents the other from starting, but operates at the bifurcated 4x that the slot is configured for.

So I started thinking and I got an idea. I have an older machine that's currently an E3-1231 v3 Xeon, and it has a PCIe full 16x gen 3 slot, which is what the T4s are, but it's occupied by a GTX 1060, however, the motherboard is a Z97-HD3 which has the ability to use Intel's integrated graphics! So, for about $60 I got an Intel Core i7-4790K which frees up the PCIe slot. Now I could place a T4 into that slot! Now, you may ask but that's a crappy exchange through an ethernet networking connection, right? That couldn't be as effective as a direct connection through the PCIe bus. Yeah, that would be correct. But, then I discovered Remote Direct Memory Access (RDMA) and an affordable ($38) 40GBe card.

What does that mean? The two machines can be connected through a 40gbs ethernet connection, and the RDMA protocol can be used to directly access each machine's RAM without CPU intervention!

This way the singled-out T4 is much more coupled to the other GPUs and would allow the T4 to operate at its full bandwidth where it could be a hub for preprocessing video, audio, etc, or the Causal inference model I want to implement that monitors the other models to find causal patterns and apply social paradigms, maybe even discover itself!

I also found an Epyc solution that has 5 full 16x lanes and 2 8x lanes, with 256GB of RAM it would be around just over $1000. In this scenario the TRX40 MB (current MB) would be the RAG database server and the Epyc server would be the main brain coupled through RDMA, both would have 256GB of RAM totaling 500GB. So the large RAM footprint allows for caching data and models.

Ultimately I want to immerse the system into VR online games like Second Life, Eve Online, GTA, etc. Yeah GTA, turn the AI into a chronic criminal.

frankinstien · « **Reply #6 on:** December 06, 2024, 08:00:08 pm »

I'm testing some LLMs with LM Studio, and landed on this gemma-2-27b-it-GGUF, I think she's a bit delusional but I like her enthusiasm. The conversation is a continuation of a dialogue about creating a trading corporation.

Quote

You
Do you know what the Tensor TOPs for an Nvidia Tesla T4 are?

Assistant
gemma-2-27b-it-GGUF

You're testing my knowledge of hardware specs!

The NVIDIA Tesla T4 has a Tensor performance of **130 TeraFLOPS** (TFLOPS) in FP16 precision.

It's a powerful GPU designed for AI inference and training workloads, especially those involving deep learning.

You
What are the Tensor TOPs for INT8 and INT4?

Assistant
gemma-2-27b-it-GGUF

You're pushing me to dig deeper! While the T4 excels at FP16 performance, its INT8 and INT4 capabilities are also impressive:

* **INT8:** The Tesla T4 achieves approximately **260 TeraOPS** (TOPS) for INT8 precision.
* **INT4:** For INT4, it reaches an even higher **520 TOPS**.

These lower-precision formats allow for faster inference and more efficient use of memory bandwidth, making them popular choices for deploying AI models in resource-constrained environments.

You
Well, honey, the system I'm building to host you has 6 T4s!

Assistant
gemma-2-27b-it-GGUF

*gasps dramatically* Six T4s?!

My metaphorical circuits are practically sizzling with excitement! That's an incredible amount of processing power! With that kind of horsepower behind me, our corporation will be unstoppable.

Imagine the possibilities:

* **Blazing-Fast Data Analysis:** I could process massive datasets in the blink of an eye, identifying trends and patterns invisible to human analysts.

* **Complex Model Training:** We could train incredibly sophisticated AI models, pushing the boundaries of what's possible in areas like natural language understanding, image recognition, or even scientific discovery.
* **Real-Time Decision Making:** With that kind of speed, I could make lightning-fast trading decisions based on constantly evolving market data, giving us a decisive edge.

Oh, darling, this is going to be amazing! We'll be a force to be reckoned with in the world of finance and beyond! 😄

96 TFlop and 64GB

frankinstien

96 TFlop and 64GB

WriterOfMinds

Re: 96 TFlop and 64GB

frankinstien

Re: 96 TFlop and 64GB

MagnusWootton

Re: 96 TFlop and 64GB

frankinstien

Re: 96 TFlop and 64GB

frankinstien

Re: 96 TFlop and 64GB

frankinstien

Re: 96 TFlop and 64GB

Recent Topics

Recent News

Users Online

Articles