Running local AI models

frankinstien · « **on:** October 04, 2024, 11:51:42 pm »

I'm using Lm Studio as a server and have used it as an app as well, but the LLMs out there are outstanding! They are getting smaller and are competitive with online solutions like Replika! Also, the ability to operate these without NSFW filters makes them great when plotting to rob banks or murders! LoL Or at least creating a script or novel along those lines, even horror and intimacy interactions are off the charts!

So, the ability to do other types of local models such as voice where solutions like parler-tts and tortoise-tts have excellent voice abilities where you can even customize them to whoever's voice you like! Also, Whisper can do the opposite STT, and no censorship! Also, there are photo and video solutions like LLaVA-NeXT where the AI can create an impression or create images and videos based on prompts.

Here's the good part integrating these into a system that can see, hear, and imagine is a reality, taking each output and prompting the other provides for a kind of feedback approach to create...well, some might argue, but a persona. Enhancing the prompts with other types of data and even using some causality models we might just get that person from science fiction and all done from a PC!

What's required on the local machine is more than one GPU, where RTX 4070 ti supers are selling for $650, but you mix and match what you want where perhaps using an RTX 4090 for image and video is best and apply the RTX 4070 ti to do the rest. With three GPUS with just the minimum of an RTX 4070 ti that's 50GB of ram! But perhaps you need more since you may what a VR setup as well and give your bot a virtual body!

It's just freaking fantastic what is possible today and it's free from the clutches of politically correct censorship. Let your imagination go and apply your skills towards integration and you could very well build a very sophisticated competitor to ChatGPT 40 that runs at home.

Now what's a challenge is the development of a hardbody, animatronic facial expressions (the generated prompts from LLM models are freaking great, they could be used to control expressions and even position a body!)

It's a great time for the enthusiast, for a while now I thought everything was going to be locked up in the corporate cloud, controlled through a pay interface, and monitored by Big Brother, but America proves itself to be the land of freedom, and the industry has opened up to the little guy...

frankinstien · « **Reply #1 on:** **Today** at 01:32:38 am »

I started to look at the AMD Instinct cards where ROCm has support for the Mi50 and Mi60 cards but it will deprecate support on the next release of ROCm. For $135 and 16 GB of RAM with 27 TFLOPs F16 isn't bad for the price. But they are passively cooled where you have to get a blower to cool them off, not expensive but noisy. Three cards will run you $405 and give 81 TFLOPs with 48GB of RAM. Now the Mi60s run $299 and can perform at almost 30 TFLOPs at FP16 and 32 GB of RAM, 3 of those runs $1000 but you get 90 TFLOPs and 96 GB of RAM! These cards have to run under Linux but as a companion solution, a dedicated system using Linux isn't bad. LM Studio works with ROCm so using the AMD units should work...

frankinstien · « **Reply #2 on:** **Today** at 06:58:41 am »

Here's an interesting solution to run very large models by injecting into the GPU memory layers of the model at a time and retaining those outputs to deliver to the next layer. Interesting idea. It can also compress the models as well and improve the inferencing 3x! I haven't tried it yet but here's a video that did, however, the video was done before the compression feature was developed. But you can see how you can run perhaps a 2x to 3x model size over GPU memory. So, a 16GB GPU could easily run a 48GB model!

https://github.com/lyogavin/airllm

Video:

spydaz · « **Reply #3 on:** **Today** at 09:00:53 am »

Quote from: frankinstien on October 04, 2024, 11:51:42 pm

I'm using Lm Studio as a server and have used it as an app as well, but the LLMs out there are outstanding! They are getting smaller and are competitive with online solutions like Replika! Also, the ability to operate these without NSFW filters makes them great when plotting to rob banks or murders! LoL Or at least creating a script or novel along those lines, even horror and intimacy interactions are off the charts!

So, the ability to do other types of local models such as voice where solutions like parler-tts and tortoise-tts have excellent voice abilities where you can even customize them to whoever's voice you like! Also, Whisper can do the opposite STT, and no censorship! Also, there are photo and video solutions like LLaVA-NeXT where the AI can create an impression or create images and videos based on prompts.

Here's the good part integrating these into a system that can see, hear, and imagine is a reality, taking each output and prompting the other provides for a kind of feedback approach to create...well, some might argue, but a persona. Enhancing the prompts with other types of data and even using some causality models we might just get that person from science fiction and all done from a PC!

What's required on the local machine is more than one GPU, where RTX 4070 ti supers are selling for $650, but you mix and match what you want where perhaps using an RTX 4090 for image and video is best and apply the RTX 4070 ti to do the rest. With three GPUS with just the minimum of an RTX 4070 ti that's 50GB of ram! But perhaps you need more since you may what a VR setup as well and give your bot a virtual body!

It's just freaking fantastic what is possible today and it's free from the clutches of politically correct censorship. Let your imagination go and apply your skills towards integration and you could very well build a very sophisticated competitor to ChatGPT 40 that runs at home.

Now what's a challenge is the development of a hardbody, animatronic facial expressions (the generated prompts from LLM models are freaking great, they could be used to control expressions and even position a body!)

It's a great time for the enthusiast, for a while now I thought everything was going to be locked up in the corporate cloud, controlled through a pay interface, and monitored by Big Brother, but America proves itself to be the land of freedom, and the industry has opened up to the little guy...

yes i also use lmstudio as the AIP server !

The back end enablses you to see what is happening in the server this i like as well as you can control some settings!

I generally find that serving the models performs better tha loading them with the hugging face , But the airLLm is also good ~!~ ( useful)

the aim is to train a model ( 7b ) for tyour general needs ! ... ( i did this ) ( great model ) mistral !

Even to , create some custom architectures !
As they show all of the models code inside the hugging face library ! So you can replicate this easy with your own customized network :
Clone the trainsformers , and patch your model into the source and use thier library and training etc !

I personally had to switch over to python ! LOL!

SO now we can actually create MASTER APPS ! no Probs !

Running local AI models

frankinstien

Running local AI models

frankinstien

Re: Running local AI models

frankinstien

Re: Running local AI models

spydaz

Re: Running local AI models

Recent Topics

Recent News

Users Online

Articles