Ai Dreams Forum

Member's Experiments & Projects => AI Programming => Topic started by: unreality on November 18, 2017, 04:23:52 am

Title: GPU paralleling question
Post by: unreality on November 18, 2017, 04:23:52 am: Is there someone here whoâ€™s coded GPUs? Some of these GPUs have over 4000 cores, but yet theyâ€™re only about 5 times faster at data mining than a cpu in desktop PCs. The GPU clock speed isnâ€™t that much slower than the cpu. The cpu only has about 8 cores. The GPU should be at least 300 times faster unless it has some major limitations. I would have thought GPUs would be great a data mining.

Maybe data mining isnâ€™t a good example. Maybe it needs to use global memory too often. Or maybe the cores need to communicate too much or whatever. So what if each gpu core only used it's local memory. Would the gpu then be hundreds of times faster than the cpu? That's what I'd like to really know.
Title: Re: GPU paralleling question
Post by: ranch vermin on November 18, 2017, 06:21:10 am: GPU FAN BOY -> ur reading some poor sources there, my gtx980 (2000 corer) clears my quadcore by 500 times.

MORE LENIENT TO CPUS -> there are ways to make a cpu go better, for example, a Cpu can do a box blur as fast as a gpu by snakeying a box and adding and taking from it, and also building box keys can be done on cpu quite well if you read once write many, *but* a gpu will just naively chew through it with a double for loop in the threading box, so they are both pretty cool.

My honest opinion, is if GPUs had more ram id never use system code again, because its too taxing on my mind, and u get your operation up and running quicker with less building hassles.

Thats... er... after youve finished getting through all the horrid documentation and wrote a billion lines just to set the basic system up. hmm im contradicting myself.
Title: Re: GPU paralleling question
Post by: Marco on November 18, 2017, 08:51:36 am: Quote from: unreality on November 18, 2017, 04:23:52 am
Is there someone here whoâ€™s coded GPUs? Some of these GPUs have over 4000 cores, but yet theyâ€™re only about 5 times faster at data mining than a cpu in desktop PCs. The GPU clock speed isnâ€™t that much slower than the cpu. The cpu only has about 8 cores. The GPU should be at least 300 times faster unless it has some major limitations. I would have thought GPUs would be great a data mining.

You cannot compare GPUs to CPUs that easily. Their architecture and how they operate is completetly different. Also, you cannot compare CPUs just by their clock speed and number of cores. Like a 12 year old Pentium 4 processor with 4GHz could not beat in the slightest a single core of a Coffe-Lake or Ryzen CPU which runs on 2GHz.
Title: Re: GPU paralleling question
Post by: unreality on November 18, 2017, 03:04:23 pm: Quote from: ranch vermin on November 18, 2017, 06:21:10 am
GPU FAN BOY -> ur reading some poor sources there, my gtx980 (2000 corer) clears my quadcore by 500 times.

MORE LENIENT TO CPUS -> there are ways to make a cpu go better, for example, a gpu can do a box blur as fast as a gpu by snakeying a box and adding and taking from it, and also building box keys can be done on cpu quite well if you read once write many, *but* a gpu will just naively chew through it with a double for loop in the threading box, so they are both pretty cool.

My honest opinion, is if GPUs had more ram id never use system code again, because its too taxing on my mind, and u get your operation up and running quicker with less building hassles.

Thats... er... after youve finished getting through all the horrid documentation and wrote a billion lines just to set the basic system up. hmm im contradicting myself.

That's great news. Maybe you're doing it the right way. Below is one source that gives tons of data mining gpu and cpu examples using well known data mining benchmark apps. The fastest gpu score is 16032, while the fastest cpu score is 3500. The gpu isn't even 5 times faster.

My Surface Pro 3 tablet that I use here to surf the internet that has an i5 is about 1/7th it's gpu.

There's a youtube video (haven't found the link yet), but the guy shows gpu code along with how long the gpu takes to clear 1<20 (~ a million) float type size. It doesn't get much simpler than that. The loop size was 1<20 (~ a million). When he used just one gpu core, it took a whopping 463 ms! When he used 256 cores it took 2.7ms. What's great, but what's interesting is that a typical desktop pc should take about 5 to 10 ms to do that! Once again we have that 1/5th figure! Why?? I understand gpus are amazing at graphics, but I'm more interested in AI, pattern recognition, etc. BTW, that youtube guy has a lot of gpu teaching videos. So one would expect the guy to know his stuff. Why would it take one gpu core so long to clear a million floats? Sure, when he used 256 cores, it was about 170 times faster, but that's only about 5 times faster than a typical cpu. What am I missing?

Zillions of gpu & cpu data mining benchmarks:
http://monerobenchmarks.info/
Title: Re: GPU paralleling question
Post by: unreality on November 18, 2017, 03:10:04 pm: Quote from: Marco on November 18, 2017, 08:51:36 am
Quote from: unreality on November 18, 2017, 04:23:52 am
Is there someone here whoâ€™s coded GPUs? Some of these GPUs have over 4000 cores, but yet theyâ€™re only about 5 times faster at data mining than a cpu in desktop PCs. The GPU clock speed isnâ€™t that much slower than the cpu. The cpu only has about 8 cores. The GPU should be at least 300 times faster unless it has some major limitations. I would have thought GPUs would be great a data mining.

You cannot compare GPUs to CPUs that easily. Their architecture and how they operate is completetly different. Also, you cannot compare CPUs just by their clock speed and number of cores. Like a 12 year old Pentium 4 processor with 4GHz could not beat in the slightest a single core of a Coffe-Lake or Ryzen CPU which runs on 2GHz.
That must be the case. They're awesome at graphics, but I haven't yet seen an example where they're much more than 5 to 10 times faster than a typical desktop cpu in terms of number crunching one would find in AI and data mining.

My AI will do a lot of basic arithmetic and RAM reads/writes. Is there anyway to get a gpu to be at least a 100 times faster than say a $2000 desktop pc?
Title: Re: GPU paralleling question
Post by: Marco on November 18, 2017, 03:41:39 pm: GPUs are good at computing the same calculation on very large batches of data (e.g. training a neural net). That's where they currently outperform CPUs easily.
Title: Re: GPU paralleling question
Post by: unreality on November 18, 2017, 04:30:05 pm: Quote from: Marco on November 18, 2017, 03:41:39 pm
GPUs are good at computing the same calculation on very large batches of data (e.g. training a neural net). That's where they currently outperform CPUs easily.
That makes sense, albeit disappointing for me since I don't do neural nets. GPUs are very efficient at that. Data mining is probably more like data crunching. While looking a lot of cpu and gpu specs I couldn't help notice that cpus are typically 50 watts while gpus are about 300 watts, although there are a lot of 150 W gpus and 25 watt cpus. 300 / 50 is 6. That's roughly the benchmark difference between gpus and cpus.

Uggg! I guess there's no free lunch? So if I want my AI to be 100 times faster I might have to buy massive amounts of mini motherboards with truckloads of ram chips. It seems RAM is the bottleneck here, no? GPU has incredibly high memory bandwidth, but that's because they read/write thousands of bits at once. Unless I'm missing something here, my AI can't take advantage of that. It's into finer data, dealing with 8 to 64 bit data types, e.g. cluster priorities. My cluster priority doesn't need a thousand bit precision. Even 7 bits is enough.

What I need is more on the lines of massive parallel memory. Each core should have it's own RAM such that each core RAM is not tied down to other RAM blocks. Are there any circuit designers here? Maybe FPGA can accomplish this, but how fast. Imagine a large FPGA chip that has 100 simple CPUs, and each CPU has it's own RAM built into the FPGA. There could also be a central CPU and RAM that periodically communicates with the other CPUs.
Title: Re: GPU paralleling question
Post by: ivan.moony on November 18, 2017, 04:35:59 pm: Quantum computer (https://en.wikipedia.org/wiki/Quantum_computing), maybe?
Title: Re: GPU paralleling question
Post by: unreality on November 18, 2017, 04:54:28 pm: A quantum computer is when we will definitely get the Singularity, but hopefully it won't require that. The movie Automata comes to mind. It will absolutely positively 100% guarantee happen. It's only a matter of when. At least by my definition of what Singularity means, which is when AI will be smart enough to improve itself, and that improved version will improve itself, and on and on.
Title: Re: GPU paralleling question
Post by: infurl on November 18, 2017, 08:23:57 pm: Quote from: unreality on November 18, 2017, 04:54:28 pm
A quantum computer is when we will definitely get the Singularity, but hopefully it won't require that. The movie Automata comes to mind. It will absolutely positively 100% guarantee happen. It's only a matter of when. At least by my definition of what Singularity means, which is when AI will be smart enough to improve itself, and that improved version will improve itself, and on and on.

I wonder how you can claim that. It sounds like a religious belief.

https://en.wikipedia.org/wiki/Quantum_algorithm

Quote
Problems which are undecidable using classical computers remain undecidable using quantum computers. What makes quantum algorithms interesting is that they might be able to solve some problems faster than classical algorithms.
Title: Re: GPU paralleling question
Post by: unreality on November 18, 2017, 08:34:57 pm: Quote from: infurl on November 18, 2017, 08:23:57 pm
Quote from: unreality on November 18, 2017, 04:54:28 pm
A quantum computer is when we will definitely get the Singularity, but hopefully it won't require that. The movie Automata comes to mind. It will absolutely positively 100% guarantee happen. It's only a matter of when. At least by my definition of what Singularity means, which is when AI will be smart enough to improve itself, and that improved version will improve itself, and on and on.

I wonder how you can claim that. It sounds like a religious belief.

https://en.wikipedia.org/wiki/Quantum_algorithm

Quote
Problems which are undecidable using classical computers remain undecidable using quantum computers. What makes quantum algorithms interesting is that they might be able to solve some problems faster than classical algorithms.

So you're one of those humans who refuse to believe AI will surpass us? To me that seems like human ego. Why do you call the obvious a religion? Take a look at the growth rate of science. Seems obvious to me.
Title: Re: GPU paralleling question
Post by: infurl on November 18, 2017, 09:50:24 pm: Quote from: unreality on November 18, 2017, 08:34:57 pm
So you're one of those humans who refuse to believe AI will surpass us? To me that seems like human ego. Why do you call the obvious a religion? Take a look at the growth rate of science. Seems obvious to me.

https://www.youtube.com/watch?v=wvVPdyYeaQU
Title: Re: GPU paralleling question
Post by: unreality on November 18, 2017, 09:55:02 pm: Quote from: infurl on November 18, 2017, 09:50:24 pm
Quote from: unreality on November 18, 2017, 08:34:57 pm
So you're one of those humans who refuse to believe AI will surpass us? To me that seems like human ego. Why do you call the obvious a religion? Take a look at the growth rate of science. Seems obvious to me.

https://www.youtube.com/watch?v=wvVPdyYeaQU
smh
Title: Re: GPU paralleling question
Post by: ranch vermin on November 19, 2017, 06:20:41 am: because gpus go fast, you can end up being that lazy to the point its only 5 times faster. but the performance is actually there if you can code it half decently.
Title: Re: GPU paralleling question
Post by: ranch vermin on November 19, 2017, 02:31:05 pm: your factors are wrong, because are u sure he wasnt using an old one?

You can actually put a filter framework on the cpu, how gpus do it, and it will be NONCOMPUTING basicly, in minutes per frame, when the gpu is finishing frame after frame under a second, you can code raytracers to know that.
Title: Re: GPU paralleling question
Post by: unreality on November 19, 2017, 02:46:08 pm: No, I found out what's happening. GPUs are only good if you're application doesn't require compute units to run in complete sync or if you're dealing with KB of data, not MB, or unless you have some astonishing unique GPU I'm unaware of. GPUs have registry memory also called private memory, which is fast like cpu memory, but it's tiny. GPUs also have local memory, which isn't so fast and can typically take 10 cycles latency per read/write, but again this usually on the order of KB. Global GPU memory is large, but has latency on the order of 400 to 800 clock cycles. Sure there are faster GPUs, but I haven't found any that are magnitudes faster.

The problem with that guys GPU code was that he was setting 1MB of memory, which goes way into global memory. It just depends on what kind of code you're using. If you know of a data mining benchmark that shows 500 times that of typical cpu then awesome, but out of the 1000s of benchmarks posted by people around the world so far there aren't any.

That's great that some NN applications can take advantage of GPUs.

My AI needs GB, not KB, and it deals with 8 to 64 bit data types, not 2,048. I'm not sure a GPU will increase the performance of my future ASI by than 10 times.
Title: Re: GPU paralleling question
Post by: unreality on November 19, 2017, 03:36:08 pm: Also there aren't that many CUs / SMs on a GPU. The GTX 980 is a good graphics card, but it has 16.
Title: Re: GPU paralleling question
Post by: keghn on November 20, 2017, 02:48:53 pm: Anybody know or have a tutorial on learning to program Nvida GPU by example?
Title: Re: GPU paralleling question
Post by: keghn on November 20, 2017, 06:22:20 pm: There is a choice of between GPU programs. Like there is CUDA and OpenCL.

Data Movement in OpenCL (7):

https://www.youtube.com/watch?v=1MvEGBKxv-Y
Title: Re: GPU paralleling question
Post by: unreality on November 21, 2017, 08:41:44 am: Quote from: unreality on November 18, 2017, 03:04:23 pm
Quote from: ranch vermin on November 18, 2017, 06:21:10 am
GPU FAN BOY -> ur reading some poor sources there, my gtx980 (2000 corer) clears my quadcore by 500 times.

MORE LENIENT TO CPUS -> there are ways to make a cpu go better, for example, a gpu can do a box blur as fast as a gpu by snakeying a box and adding and taking from it, and also building box keys can be done on cpu quite well if you read once write many, *but* a gpu will just naively chew through it with a double for loop in the threading box, so they are both pretty cool.

My honest opinion, is if GPUs had more ram id never use system code again, because its too taxing on my mind, and u get your operation up and running quicker with less building hassles.

Thats... er... after youve finished getting through all the horrid documentation and wrote a billion lines just to set the basic system up. hmm im contradicting myself.

That's great news. Maybe you're doing it the right way. Below is one source that gives tons of data mining gpu and cpu examples using well known data mining benchmark apps. The fastest gpu score is 16032, while the fastest cpu score is 3500. The gpu isn't even 5 times faster.

My Surface Pro 3 tablet that I use here to surf the internet that has an i5 is about 1/7th it's gpu.

There's a youtube video (haven't found the link yet), but the guy shows gpu code along with how long the gpu takes to clear 1<20 (~ a million) float type size. It doesn't get much simpler than that. The loop size was 1<20 (~ a million). When he used just one gpu core, it took a whopping 463 ms! When he used 256 cores it took 2.7ms. What's great, but what's interesting is that a typical desktop pc should take about 5 to 10 ms to do that! Once again we have that 1/5th figure! Why?? I understand gpus are amazing at graphics, but I'm more interested in AI, pattern recognition, etc. BTW, that youtube guy has a lot of gpu teaching videos. So one would expect the guy to know his stuff. Why would it take one gpu core so long to clear a million floats? Sure, when he used 256 cores, it was about 170 times faster, but that's only about 5 times faster than a typical cpu. What am I missing?

Zillions of gpu & cpu data mining benchmarks:
http://monerobenchmarks.info/

I did the same gpu test on my tablet that has an i5 4300U cpu. To make it simple I used one core. It has 4 cores. It wrote to one MB of RAM, taking 0.4 ms. The 256 gpu cores took 2.7 ms. If I used all 4 cpu cores it would have taken 0.1 ms. That's 27 times faster on an old windows tablet. Maybe someone can test their gpu to see how long it takes to write to 1<<20 bytes of RAM. I'm guessing that data mining uses more of the gpu features, and NN training uses even more. It's also possible the youtube guy uses a cheap gpu. That's difficult to believe since he's a popular channel that specializes in gpus.
Title: Re: GPU paralleling question
Post by: ranch vermin on November 21, 2017, 01:11:56 pm: Quote from: unreality on November 19, 2017, 02:46:08 pm
No, I found out what's happening. GPUs are only good if you're application doesn't require compute units to run in complete sync or if you're dealing with KB of data, not MB, or unless you have some astonishing unique GPU I'm unaware of. GPUs have registry memory also called private memory, which is fast like cpu memory, but it's tiny. GPUs also have local memory, which isn't so fast and can typically take 10 cycles latency per read/write, but again this usually on the order of KB. Global GPU memory is large, but has latency on the order of 400 to 800 clock cycles. Sure there are faster GPUs, but I haven't found any that are magnitudes faster.

The problem with that guys GPU code was that he was setting 1MB of memory, which goes way into global memory. It just depends on what kind of code you're using. If you know of a data mining benchmark that shows 500 times that of typical cpu then awesome, but out of the 1000s of benchmarks posted by people around the world so far there aren't any.

That's great that some NN applications can take advantage of GPUs.

My AI needs GB, not KB, and it deals with 8 to 64 bit data types, not 2,048. I'm not sure a GPU will increase the performance of my future ASI by than 10 times.

Make sure you check the date of the paper your looking at, I just had a look myself and saw a convolution filter benchmark I disagree with but it was from the directx 9 days. the gpu should totally whip a cpu at it. but I guess if you dont believe me, just wait till you do it for yourself for real.

But doesnt matter anyway, I could be wrong still. I need to do more testing yet myself, and my gpu just died. :P So i cant even find out for sure.
Title: Re: GPU paralleling question
Post by: unreality on November 21, 2017, 07:38:35 pm: Let me know when you're able to get a good test on a gpu. CPUs have their advantage. Same goes for GPUs. I thought GPUs were supposed to be faster with memory, CPUs better at logic and if statements. Although if you get a good cpu like the i9, which has a lot of L1 cache, then you can get high memory bandwidth. Well over 2000 GB/s. That's blazing!

There are endless benchmarks. The only ones I've found that compare CPUs with GPUs is data mining. Like I said, thousands of tests from people around the world using known data mining benchmark programs show GPUs were roughly 5 times faster.

It's difficult to say until my AI code is done, many many years from now, but so far it seems it will run best on CPUs. So far it's very logic intensive, but also does a lot of rapid look up table calls all over the place.