ollama and llama3

infurl · « **on:** July 10, 2024, 02:17:07 am »

Access to generative artificial intelligence just changed radically and for the better. Until recently our options were to use online services which were potentially very expensive and almost certainly heavily restricted, or to try to use open source models locally which required high end hardware to operate and which produced disappointing and mediocre results at best.

Last year we saw the release of ollama which made it incredibly easy to run just about any large language model locally no matter what platform you're on. You still needed a powerful system but at least you didn't have to learn a lot of obscure methods to use it.

https://ollama.com/

Last month the open source large language model llama3 was released. It has proven to be as capable as models two hundred times its size and is so efficient you can run it on a Raspberry Pi 5 if you want to, though it might take some patience.

I've been experimenting with it and it seems to be as good as any of the models that I have used online. I am running it on a Linux system with 24 cores, 64 GB of RAM, and 16 GB of video RAM. The smaller 8 billion parameter model responds to my queries almost instantly while the larger 70 billion parameter model can take a minute or two. Mostly the results produced by the smaller model are quite good enough.

8pla.net · « **Reply #1 on:** July 21, 2024, 02:49:22 pm »

Wow, you are so lucky to enjoy a "Linux system with 24 cores, 64 GB of RAM, and 16 GB of video RAM"!
Those of us on a budget can do, "nproc --all" in Linux terminal to get our number of cores...
My laptop CPU has 4 cores. Then In Linux terminal, "cat proc/meminfo" displays my RAM
Memory Total to be: 16GB. And command: glxinfo|egrep -i "Video memory:" displays,
"15733MB" for my VRAM.

C++ may be able to do more bare bones generative artificial intelligence with less expensive hardware.

infurl · « **Reply #2 on:** July 21, 2024, 03:08:42 pm »

Quote from: 8pla.net on July 21, 2024, 02:49:22 pm

Those of us on a budget can do, "nproc --all" in Linux terminal to get our number of cores...
My laptop has 4 cores. Then In Linux terminal, "cat proc/meminfo" displays ny
Memory Total to be: 16GB. And command: glxinfo|egrep -i "Video memory:" displays,
"15733MB" for my VRAM.

There's nothing shabby about your laptop if it has 16GB of VRAM. Is it a gaming laptop, or have you got an external GPU? Your main constraint is the amount of system memory that you have because you need to be able to load the model data into memory in its entirety. You'll be able to run all the models that are less than 7 billion parameters with that, no problem, as they are around 5 GB.

8pla.net · « **Reply #3 on:** July 21, 2024, 03:57:54 pm »

Thanks so much for the like you gave to my reply! I really appreciate that!

I guess a business must have donated this ultra portable business laptop

to a nonprofit organization. I was able to get it for cheap. The Maximum

RAM specification is 16GB. Yet my research revealed an undocumented

32GB RAM upgrade. So, your LLM experience is very useful to me.

Does it matter if I double the RAM? Thanks for your advice.

infurl · « **Reply #4 on:** July 21, 2024, 04:07:53 pm »

Quote from: 8pla.net on July 21, 2024, 03:57:54 pm

Does it matter if I double the RAM? Thanks for your advice.

Doubling the amount of RAM that you have would allow you to run larger models than you can now. For example one of the newest and best ones is gemma2:27b which is 15GB in size. That won't run on your current system but will certainly run on a 32GB system. There are smaller versions of all the models that will run easily in 16 GB of RAM though and the larger models are significantly slower, so it probably isn't worth it. I think you would need to upgrade your CPU as well as the RAM to be able to run the larger models fast enough to be useful.

spydaz · « **Reply #5 on:** August 24, 2024, 02:55:13 pm »

Quote from: infurl on July 10, 2024, 02:17:07 am

Access to generative artificial intelligence just changed radically and for the better. Until recently our options were to use online services which were potentially very expensive and almost certainly heavily restricted, or to try to use open source models locally which required high end hardware to operate and which produced disappointing and mediocre results at best.

Last year we saw the release of ollama which made it incredibly easy to run just about any large language model locally no matter what platform you're on. You still needed a powerful system but at least you didn't have to learn a lot of obscure methods to use it.

https://ollama.com/

Last month the open source large language model llama3 was released. It has proven to be as capable as models two hundred times its size and is so efficient you can run it on a Raspberry Pi 5 if you want to, though it might take some patience.

I've been experimenting with it and it seems to be as good as any of the models that I have used online. I am running it on a Linux system with 24 cores, 64 GB of RAM, and 16 GB of video RAM. The smaller 8 billion parameter model responds to my queries almost instantly while the larger 70 billion parameter model can take a minute or two. Mostly the results produced by the smaller model are quite good enough.

Yes i think we need a area just for hugging face posts !
https://huggingface.co/LeroyDyer

ollama and llama3

infurl

ollama and llama3

8pla.net

Re: ollama and llama3

infurl

Re: ollama and llama3

8pla.net

Re: ollama and llama3

infurl

Re: ollama and llama3

spydaz

Re: ollama and llama3

Recent Topics

Recent News

Users Online

Articles