ollama and llama3

  • 5 Replies
  • 130399 Views
*

infurl

  • Administrator
  • ***********
  • Eve
  • *
  • 1372
  • Humans will disappoint you.
    • Home Page
ollama and llama3
« on: July 10, 2024, 02:17:07 am »
Access to generative artificial intelligence just changed radically and for the better. Until recently our options were to use online services which were potentially very expensive and almost certainly heavily restricted, or to try to use open source models locally which required high end hardware to operate and which produced disappointing and mediocre results at best.

Last year we saw the release of ollama which made it incredibly easy to run just about any large language model locally no matter what platform you're on. You still needed a powerful system but at least you didn't have to learn a lot of obscure methods to use it.

https://ollama.com/

Last month the open source large language model llama3 was released. It has proven to be as capable as models two hundred times its size and is so efficient you can run it on a Raspberry Pi 5 if you want to, though it might take some patience.

I've been experimenting with it and it seems to be as good as any of the models that I have used online. I am running it on a Linux system with 24 cores, 64 GB of RAM, and 16 GB of video RAM. The smaller 8 billion parameter model responds to my queries almost instantly while the larger 70 billion parameter model can take a minute or two. Mostly the results produced by the smaller model are quite good enough.
« Last Edit: July 11, 2024, 11:55:29 pm by infurl »

*

8pla.net

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1307
  • TV News. Pub. UAL (PhD). Robitron Mod. LPC Judge.
    • 8pla.net
Re: ollama and llama3
« Reply #1 on: July 21, 2024, 02:49:22 pm »
Wow, you are so lucky to enjoy a "Linux system with 24 cores, 64 GB of RAM, and 16 GB of video RAM"!
Those of us on a budget can do, "nproc --all" in Linux terminal to get our number of cores...
My laptop CPU has 4 cores.   Then In Linux terminal, "cat proc/meminfo" displays my RAM
Memory Total to be: 16GB.   And command: glxinfo|egrep -i "Video memory:" displays,
"15733MB" for my VRAM.

C++ may be able to do more bare bones generative artificial intelligence with less expensive hardware.
My Very Enormous Monster Just Stopped Using Nine

*

infurl

  • Administrator
  • ***********
  • Eve
  • *
  • 1372
  • Humans will disappoint you.
    • Home Page
Re: ollama and llama3
« Reply #2 on: July 21, 2024, 03:08:42 pm »
Those of us on a budget can do, "nproc --all" in Linux terminal to get our number of cores...
My laptop has 4 cores.   Then In Linux terminal, "cat proc/meminfo" displays ny
Memory Total to be: 16GB.   And command: glxinfo|egrep -i "Video memory:" displays,
"15733MB" for my VRAM.

There's nothing shabby about your laptop if it has 16GB of VRAM. Is it a gaming laptop, or have you got an external GPU? Your main constraint is the amount of system memory that you have because you need to be able to load the model data into memory in its entirety. You'll be able to run all the models that are less than 7 billion parameters with that, no problem, as they are around 5 GB.

*

8pla.net

  • Trusty Member
  • ***********
  • Eve
  • *
  • 1307
  • TV News. Pub. UAL (PhD). Robitron Mod. LPC Judge.
    • 8pla.net
Re: ollama and llama3
« Reply #3 on: July 21, 2024, 03:57:54 pm »
Thanks so much for the like you gave to my reply!  I really appreciate that!

I guess a business must have donated this ultra portable business laptop

to a nonprofit organization.  I was able to get it for cheap.  The Maximum

RAM  specification is 16GB.  Yet my research revealed an undocumented

32GB RAM upgrade.  So, your LLM experience is very useful to me. 

Does it matter if I double the RAM?  Thanks for your advice.
My Very Enormous Monster Just Stopped Using Nine

*

infurl

  • Administrator
  • ***********
  • Eve
  • *
  • 1372
  • Humans will disappoint you.
    • Home Page
Re: ollama and llama3
« Reply #4 on: July 21, 2024, 04:07:53 pm »
Does it matter if I double the RAM?  Thanks for your advice.

Doubling the amount of RAM that you have would allow you to run larger models than you can now. For example one of the newest and best ones is gemma2:27b which is 15GB in size. That won't run on your current system but will certainly run on a 32GB system. There are smaller versions of all the models that will run easily in 16 GB of RAM though and the larger models are significantly slower, so it probably isn't worth it. I think you would need to upgrade your CPU as well as the RAM to be able to run the larger models fast enough to be useful.

*

spydaz

  • Trusty Member
  • *******
  • Starship Trooper
  • *
  • 325
  • Developing Conversational AI (Natural Language/ML)
    • Spydaz_Web
Re: ollama and llama3
« Reply #5 on: August 24, 2024, 02:55:13 pm »
Access to generative artificial intelligence just changed radically and for the better. Until recently our options were to use online services which were potentially very expensive and almost certainly heavily restricted, or to try to use open source models locally which required high end hardware to operate and which produced disappointing and mediocre results at best.

Last year we saw the release of ollama which made it incredibly easy to run just about any large language model locally no matter what platform you're on. You still needed a powerful system but at least you didn't have to learn a lot of obscure methods to use it.

https://ollama.com/

Last month the open source large language model llama3 was released. It has proven to be as capable as models two hundred times its size and is so efficient you can run it on a Raspberry Pi 5 if you want to, though it might take some patience.

I've been experimenting with it and it seems to be as good as any of the models that I have used online. I am running it on a Linux system with 24 cores, 64 GB of RAM, and 16 GB of video RAM. The smaller 8 billion parameter model responds to my queries almost instantly while the larger 70 billion parameter model can take a minute or two. Mostly the results produced by the smaller model are quite good enough.


Yes i think we need a area just for hugging face posts !
https://huggingface.co/LeroyDyer




 


Project Acuitas
by WriterOfMinds (General Project Discussion)
November 30, 2024, 01:32:29 am
Requirements for functional equivalence to conscious processing?
by DaltonG (General AI Discussion)
November 19, 2024, 11:56:05 am
Will LLMs ever learn what is ... is?
by HS (Future of AI)
November 10, 2024, 06:28:10 pm
Who's the AI?
by frankinstien (Future of AI)
November 04, 2024, 05:45:05 am
Ai improving AI
by infurl (AI Programming)
October 19, 2024, 03:43:29 am
Atronach's Eye
by WriterOfMinds (Home Made Robots)
October 13, 2024, 09:52:42 pm
Running local AI models
by spydaz (AI Programming)
October 07, 2024, 09:00:53 am
Hi IM BAA---AAACK!!
by MagnusWootton (Home Made Robots)
September 16, 2024, 09:49:10 pm
LLaMA2 Meta's chatbot released
by spydaz (AI News )
August 24, 2024, 02:58:36 pm
ollama and llama3
by spydaz (AI News )
August 24, 2024, 02:55:13 pm
AI controlled F-16, for real!
by frankinstien (AI News )
June 15, 2024, 05:40:28 am
Open AI GPT-4o - audio, vision, text combined reasoning
by MikeB (AI News )
May 14, 2024, 05:46:48 am
OpenAI Speech-to-Speech Reasoning Demo
by MikeB (AI News )
March 31, 2024, 01:00:53 pm
Say good-bye to GPUs...
by MikeB (AI News )
March 23, 2024, 09:23:52 am
Google Bard report
by ivan.moony (AI News )
February 14, 2024, 04:42:23 pm
Elon Musk's xAI Grok Chatbot
by MikeB (AI News )
December 11, 2023, 06:26:33 am

Users Online

442 Guests, 0 Users

Most Online Today: 455. Most Online Ever: 2369 (November 21, 2020, 04:08:13 pm)

Articles