How to choose card for GPU acceleration?
560ti vs 570 and 2500K vs 2600K
I took my time and compared GTX560ti and GTX570 on same machine on same clip with same NeatVideo settings, to see if much more expensive graphics card pays off with faster rendering time.-
It does not (in most real life situations).
Here are links to quite some test results (performance optimizing in NeatVideo):
http://dl.dropbox.com/u/9205816/NeatVid ... _560ti.txt
http://dl.dropbox.com/u/9205816/NeatVid ... eo_570.txt
http://dl.dropbox.com/u/9205816/NeatVid ... _570OC.txt
On short:
- Smaller the frame size, less difference between GPU's. Obviosly latency of ram and other components plays big role here. The bigger the frame size, the bigger the difference between GPU's.
- overclocking CPU and GPU does have obvious effects again only at big frame size like 4K.
- Differences in performance are visible when hit you "Optimize" NeatVideo filter, but in real life when you render clip, they are more or les gone in most of time. Rendering 200 frames with 570 finishes 2 seconds before 560ti or 4 seconds before if 570 is maximum overclocked. That means 8 minutes less rendering time if rendering 30 minutes of video (rendering time 133 min instead of 141). If that is worth buying much more expensive 570, everyone himself should consider.
Comparison of test I did.
from "optimize" option in Neat Filter.
HD= 1920x1080 [1080p]
SD= 720x576 [576p]
4K= "4K" option
8b = 8bit
32b = 32bit
Radus= always 1
570= [ENGTX570 DCII/2DIS/1280MD5]
570OC= overclocked to 900/4000 Mhz as is 560ti from [GV-N560OC-1GI]
[x]= fps best option calculated of "optimize"
SD8b = 560ti[50], 570[52,6], 570OC[52,6]
SD32b = 560ti[43,5], 570[45,5], 570OC[47,6]
HD8b = 560ti[11,5], 570[13,2], 570OC[14,5]
HD32b = 560ti[10,6], 570[11,2], 570OC[12,2]
4K8b = 560ti[2,98], 570[3,38], 570OC[3,51]
4K32b = 560ti[2,41], 570[2,7], 570OC[2,75]
It can be seen improvement from my previous posts. I changed version of NeatVideo from 3.1.0 to 3.2.0 and optimised (OCed) RAM from 9-9-9-27@1600 to 9-10-9-24@1866. All other things are same, I had latest nVidia driver.
My real life tests on a 1920x1080 clip. Rendering with 560ti = 40-42 sec. Rendering with overclocked 570 (which doubles the power consuption compared to 560ti) 37-39 sec.
In my various projects I need to clean as much as 50% 1440x1080 50i HDV, 30% 1280x720 50p HDV and 19% various SD footage and 1% fullHD. No 4k yet... So in most cases I operate in frame sizes where speed is held back with frame transport to processor and storage, rather than filter speed. 570 it seems was waste of my money.
I can now either start playing games in my free time (GPU's give huge difference in games, though!), sell the 570GPU and buy 560ti again, invest in super speed overclocked ram (My Asus P8Z86-V PRO can have up to 2200Mhz ram), or sell everything but GPU and go to SandyBridge-e with 4 chanel RAM.
Is there anyone tryed with realy tuned up X79 platform yet to see if any improvement in speed? Looking benchmarks of quad channel RAM the badwidth is much much bigger than with double channel.
**********
Regarding of 2500K vs 2600K:
2600K wins everytime at same clock, even if Neat chooses as best option to only 4 cores being used in i7. 2500K must be OCed much higher to cope with 2600K and saving here is not good favor (always buy i7 for video!). I have not exact figures with 2500K, but what is rendering 37-42 seconds on 2600K (@4500Mhz), that takes 120+ seconds on 2500K @4300Mhz. The only fugure I remembered as 2500K was so slow I didn't want to waste my time with this CPU. 2500K was used with 560ti on Same MoBo,Ram,SSD.
**********
REGARDS,
MIHAEL
It does not (in most real life situations).
Here are links to quite some test results (performance optimizing in NeatVideo):
http://dl.dropbox.com/u/9205816/NeatVid ... _560ti.txt
http://dl.dropbox.com/u/9205816/NeatVid ... eo_570.txt
http://dl.dropbox.com/u/9205816/NeatVid ... _570OC.txt
On short:
- Smaller the frame size, less difference between GPU's. Obviosly latency of ram and other components plays big role here. The bigger the frame size, the bigger the difference between GPU's.
- overclocking CPU and GPU does have obvious effects again only at big frame size like 4K.
- Differences in performance are visible when hit you "Optimize" NeatVideo filter, but in real life when you render clip, they are more or les gone in most of time. Rendering 200 frames with 570 finishes 2 seconds before 560ti or 4 seconds before if 570 is maximum overclocked. That means 8 minutes less rendering time if rendering 30 minutes of video (rendering time 133 min instead of 141). If that is worth buying much more expensive 570, everyone himself should consider.
Comparison of test I did.
from "optimize" option in Neat Filter.
HD= 1920x1080 [1080p]
SD= 720x576 [576p]
4K= "4K" option
8b = 8bit
32b = 32bit
Radus= always 1
570= [ENGTX570 DCII/2DIS/1280MD5]
570OC= overclocked to 900/4000 Mhz as is 560ti from [GV-N560OC-1GI]
[x]= fps best option calculated of "optimize"
SD8b = 560ti[50], 570[52,6], 570OC[52,6]
SD32b = 560ti[43,5], 570[45,5], 570OC[47,6]
HD8b = 560ti[11,5], 570[13,2], 570OC[14,5]
HD32b = 560ti[10,6], 570[11,2], 570OC[12,2]
4K8b = 560ti[2,98], 570[3,38], 570OC[3,51]
4K32b = 560ti[2,41], 570[2,7], 570OC[2,75]
It can be seen improvement from my previous posts. I changed version of NeatVideo from 3.1.0 to 3.2.0 and optimised (OCed) RAM from 9-9-9-27@1600 to 9-10-9-24@1866. All other things are same, I had latest nVidia driver.
My real life tests on a 1920x1080 clip. Rendering with 560ti = 40-42 sec. Rendering with overclocked 570 (which doubles the power consuption compared to 560ti) 37-39 sec.
In my various projects I need to clean as much as 50% 1440x1080 50i HDV, 30% 1280x720 50p HDV and 19% various SD footage and 1% fullHD. No 4k yet... So in most cases I operate in frame sizes where speed is held back with frame transport to processor and storage, rather than filter speed. 570 it seems was waste of my money.
I can now either start playing games in my free time (GPU's give huge difference in games, though!), sell the 570GPU and buy 560ti again, invest in super speed overclocked ram (My Asus P8Z86-V PRO can have up to 2200Mhz ram), or sell everything but GPU and go to SandyBridge-e with 4 chanel RAM.
Is there anyone tryed with realy tuned up X79 platform yet to see if any improvement in speed? Looking benchmarks of quad channel RAM the badwidth is much much bigger than with double channel.
**********
Regarding of 2500K vs 2600K:
2600K wins everytime at same clock, even if Neat chooses as best option to only 4 cores being used in i7. 2500K must be OCed much higher to cope with 2600K and saving here is not good favor (always buy i7 for video!). I have not exact figures with 2500K, but what is rendering 37-42 seconds on 2600K (@4500Mhz), that takes 120+ seconds on 2500K @4300Mhz. The only fugure I remembered as 2500K was so slow I didn't want to waste my time with this CPU. 2500K was used with 560ti on Same MoBo,Ram,SSD.
**********
REGARDS,
MIHAEL
Thank you very much for extensive testing and posting your results. It looks like they did a very good job with GTX560ti. It was developed significantly later than GTX570, which perhaps allowed the developers to better optimize the parameters to offer the performance that is so close to a more expensive GTX570.
Regarding the four-channel memory and new generation of CPUs, we have a sample measurement:
Vlad
Regarding the four-channel memory and new generation of CPUs, we have a sample measurement:
The fast four-channel memory does indeed help.System: P9X79 PRO, 3930K @ 4000 Mhz, DDR3-2133
Frame: 1920x1080 progressive, 8 bits per channel, Radius: 1 frame
Running the test data set on up to 12 CPU cores
CPU only (1 core): 1.8 frames/sec
CPU only (2 cores): 3.65 frames/sec
CPU only (3 cores): 5.32 frames/sec
CPU only (4 cores): 6.94 frames/sec
CPU only (5 cores): 8.06 frames/sec
CPU only (6 cores): 9.17 frames/sec
CPU only (7 cores): 9.62 frames/sec
CPU only (8 cores): 10 frames/sec
CPU only (9 cores): 10.1 frames/sec
CPU only (10 cores): 10.2 frames/sec
CPU only (11 cores): 10.3 frames/sec
CPU only (12 cores): 10.2 frames/sec
Vlad
Hi, it's me again :)
Last week I put Gigabyte 660ti GPU in. It renders exactly the same speed as my current GTX570. But that was expected.
My new update regards RAM. As I previously suspected that ram speed is VERY important, today I can again confirm that. I changed Kingston HyperX Blu 1600 (oced to 1866) with 2133 Genesis version. Upgrade costed 23 EUR (sold old ones, bought new ones). It brought 9,9% (let roud it to 10%) speed improvement! It's not yet overclocked RAM :)
Optimizing best NV settings before always stuck at 4 core+ GPU. But now 7 cores+ GPU. Just RAM stick change gained such improvement:
CPU from 8,55 -> 9,26 fps
GPU from 9 > 12,7 fps (!)*
CPU+GPU from 14,1 -> 16,9 fps.
* GTX570 is now paying back for my patience, not tossing it out. :)
10% exceeded all my expectations.
Bye, MIHAEL
Last week I put Gigabyte 660ti GPU in. It renders exactly the same speed as my current GTX570. But that was expected.
My new update regards RAM. As I previously suspected that ram speed is VERY important, today I can again confirm that. I changed Kingston HyperX Blu 1600 (oced to 1866) with 2133 Genesis version. Upgrade costed 23 EUR (sold old ones, bought new ones). It brought 9,9% (let roud it to 10%) speed improvement! It's not yet overclocked RAM :)
Optimizing best NV settings before always stuck at 4 core+ GPU. But now 7 cores+ GPU. Just RAM stick change gained such improvement:
CPU from 8,55 -> 9,26 fps
GPU from 9 > 12,7 fps (!)*
CPU+GPU from 14,1 -> 16,9 fps.
* GTX570 is now paying back for my patience, not tossing it out. :)
10% exceeded all my expectations.
Bye, MIHAEL
Well, here are my results... Was expecting better to be honest I can't believe a single GTX570 is faster than my 2 x 6950's... something is wrong with this...
Using :
Win 7 64bit
AMD FX-8150 Stock
2 x HD6950 GPU's
16Gig 1866Mhz RAM
Frame: 1920x1080 progressive, 8 bits per channel, Radius: 1 frame
Running the test data set on up to 8 CPU cores and on up to 2 GPUs
CPU only (1 core): 0.978 frames/sec
CPU only (2 cores): 1.96 frames/sec
CPU only (3 cores): 2.86 frames/sec
CPU only (4 cores): 3.64 frames/sec
CPU only (5 cores): 3.98 frames/sec
CPU only (6 cores): 4.22 frames/sec
CPU only (7 cores): 4.5 frames/sec
CPU only (8 cores): 4.74 frames/sec
GPU only (AMD Radeon HD 6900 Series #1): 5.41 frames/sec
GPU only (AMD Radeon HD 6900 Series #2): 5.41 frames/sec
GPU only (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.01 frames/sec
CPU (1 core) and GPU (AMD Radeon HD 6900 Series #1): 4.63 frames/sec
CPU (2 cores) and GPU (AMD Radeon HD 6900 Series #1): 5.46 frames/sec
CPU (3 cores) and GPU (AMD Radeon HD 6900 Series #1): 6.21 frames/sec
CPU (4 cores) and GPU (AMD Radeon HD 6900 Series #1): 6.62 frames/sec
CPU (5 cores) and GPU (AMD Radeon HD 6900 Series #1): 7.35 frames/sec
CPU (6 cores) and GPU (AMD Radeon HD 6900 Series #1): 7.41 frames/sec
CPU (7 cores) and GPU (AMD Radeon HD 6900 Series #1): 7.63 frames/sec
CPU (8 cores) and GPU (AMD Radeon HD 6900 Series #1): 7.81 frames/sec
CPU (1 core) and GPU (AMD Radeon HD 6900 Series #2): 4.44 frames/sec
CPU (2 cores) and GPU (AMD Radeon HD 6900 Series #2): 5.43 frames/sec
CPU (3 cores) and GPU (AMD Radeon HD 6900 Series #2): 6.25 frames/sec
CPU (4 cores) and GPU (AMD Radeon HD 6900 Series #2): 6.85 frames/sec
CPU (5 cores) and GPU (AMD Radeon HD 6900 Series #2): 7.3 frames/sec
CPU (6 cores) and GPU (AMD Radeon HD 6900 Series #2): 7.35 frames/sec
CPU (7 cores) and GPU (AMD Radeon HD 6900 Series #2): 7.63 frames/sec
CPU (8 cores) and GPU (AMD Radeon HD 6900 Series #2): 7.87 frames/sec
CPU (2 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.01 frames/sec
CPU (3 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 8.06 frames/sec
CPU (4 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.17 frames/sec
CPU (5 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.09 frames/sec
CPU (6 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.09 frames/sec
CPU (7 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.01 frames/sec
CPU (8 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 8.93 frames/sec
Best combination: CPU (4 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2)
Using :
Win 7 64bit
AMD FX-8150 Stock
2 x HD6950 GPU's
16Gig 1866Mhz RAM
Frame: 1920x1080 progressive, 8 bits per channel, Radius: 1 frame
Running the test data set on up to 8 CPU cores and on up to 2 GPUs
CPU only (1 core): 0.978 frames/sec
CPU only (2 cores): 1.96 frames/sec
CPU only (3 cores): 2.86 frames/sec
CPU only (4 cores): 3.64 frames/sec
CPU only (5 cores): 3.98 frames/sec
CPU only (6 cores): 4.22 frames/sec
CPU only (7 cores): 4.5 frames/sec
CPU only (8 cores): 4.74 frames/sec
GPU only (AMD Radeon HD 6900 Series #1): 5.41 frames/sec
GPU only (AMD Radeon HD 6900 Series #2): 5.41 frames/sec
GPU only (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.01 frames/sec
CPU (1 core) and GPU (AMD Radeon HD 6900 Series #1): 4.63 frames/sec
CPU (2 cores) and GPU (AMD Radeon HD 6900 Series #1): 5.46 frames/sec
CPU (3 cores) and GPU (AMD Radeon HD 6900 Series #1): 6.21 frames/sec
CPU (4 cores) and GPU (AMD Radeon HD 6900 Series #1): 6.62 frames/sec
CPU (5 cores) and GPU (AMD Radeon HD 6900 Series #1): 7.35 frames/sec
CPU (6 cores) and GPU (AMD Radeon HD 6900 Series #1): 7.41 frames/sec
CPU (7 cores) and GPU (AMD Radeon HD 6900 Series #1): 7.63 frames/sec
CPU (8 cores) and GPU (AMD Radeon HD 6900 Series #1): 7.81 frames/sec
CPU (1 core) and GPU (AMD Radeon HD 6900 Series #2): 4.44 frames/sec
CPU (2 cores) and GPU (AMD Radeon HD 6900 Series #2): 5.43 frames/sec
CPU (3 cores) and GPU (AMD Radeon HD 6900 Series #2): 6.25 frames/sec
CPU (4 cores) and GPU (AMD Radeon HD 6900 Series #2): 6.85 frames/sec
CPU (5 cores) and GPU (AMD Radeon HD 6900 Series #2): 7.3 frames/sec
CPU (6 cores) and GPU (AMD Radeon HD 6900 Series #2): 7.35 frames/sec
CPU (7 cores) and GPU (AMD Radeon HD 6900 Series #2): 7.63 frames/sec
CPU (8 cores) and GPU (AMD Radeon HD 6900 Series #2): 7.87 frames/sec
CPU (2 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.01 frames/sec
CPU (3 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 8.06 frames/sec
CPU (4 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.17 frames/sec
CPU (5 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.09 frames/sec
CPU (6 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.09 frames/sec
CPU (7 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 9.01 frames/sec
CPU (8 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2): 8.93 frames/sec
Best combination: CPU (4 cores) and GPU (AMD Radeon HD 6900 Series #1, AMD Radeon HD 6900 Series #2)
Usually CPU adds some speed indeed. In this case, I guess, either CPU itself or the system RAM is so busy with providing data to the GPUs that CPU-based processing (which also requires memory bandwidth) is slowed down to the point when CPU no longer contributes much to the overall processing speed. CPU does contribute when only one GPU is working, which seems to be in line with my guess.
Vlad
Vlad
i7-3930K + GTX570 - not worth the money (right now)
Hi,NVTeam wrote:Usually CPU adds some speed indeed. In this case, I guess, either CPU itself or the system RAM is so busy with providing data to the GPUs that CPU-based processing (which also requires memory bandwidth) is slowed down to the point when CPU no longer contributes much to the overall processing speed. CPU does contribute when only one GPU is working, which seems to be in line with my guess.
Vlad
I again forked some big money out in a respect to gain some performance, but I lost big this time.
I have DX79SR board from Intel + i7-3930K + 16GB HyperX 2133 (same sticks from my previous Z68 system) + bunch of Intel and Crucial SSD's.
RAM SPEED is everything with NeatVideo, I can confirm.
On Z68 I gained up to 26GB/s RAM WRITE SPEED - vs - 22GB/s at most on that "server" platform (measuring with same AIDA64, which is only dual-threaded ram bechmark).
So I actulally LOST performcance. CPU does render a bit faster (10,5 as it is stated above), but my GTX570 now does ONLY 7,5 fps, which is much less than on Z68. On this platform CPU+GPU does not gain eny benefit unless RAM is overclocked. Performance in NEATVIDEO is same if I give 4 sticks of ram or just 2, to this board.
CONCLUSION: NeatVideo is optimized for DUAL channel ONLY.
While Sisoftware SANDRA2012 shows over 50GB/s on that sistem with 4 sticks of ram, NeatVideo does not benefit from this.
SO right now it is best to have IvyBridge (or even SandyBridge with Z68), really fast ram and OC'ed CPU. Even regarding money investment! It's really cheap now to buy used Z68 platform with 2600K. :)
But I hope NEATVIDEO v4 will be 4-channel optimized too.
best regards,
MIHAEL
reply to vvulture -
Hi vvulture.
1) Do you have HDD or SSD?
I have 2x SSD system, where O.SYSTEM and TEMP FILES is on first 6Gbps SSD, but video files are on second 6gbps SSD.
2) Did you confirm ram speed by some propper software? Does it really work at that frequency? When overclock do not work (too high settings) system boots from "old" (stock) settings, but in BIOS it is sometimes still shown as it works overclocked. If you experienced NO difference something could be wrong. Overclocking CPU in my system ALWAYS show bigger performance, because whole system works faster. Overclocking GPU showed improvement only if RAM was not limiting factor. So with 1600 ram no difference, with 1866 some, with 2133 more.
3) Overclocking RAM on GPU boasts more fps than GPU engine itself. At least with my nVidia.
4) I don't know how modern AMD arhitecture works, because I left that team since I'm in video. I still have office AMD computers, where they shine. But for video AMD is not proper decision, as your 8-core results show. Your score is only a fraction better as my old Q9550 was able ( (oced a little).
5) On my new machine (6-core Intel X79) my GPU now makes only 7,6 fps, which half what did on my previous Z68 platform. Now NeatVideo does not use GPU anymore if I overclock CPU. Only at stock frequencies it uses GPU. But overal I'm stuck at 10-11fps either way. I'm crying to :(
Please run AIDA64 RAM benchmark tests and post results. Than run SiSOFTWARE SANDRA 2012 benchamrak and post results (GB/s). At AIDA64, beside results there will be written frequency and timings for ram, too. You can check current settings. My ram is CL11 BTW.
6) Does your mobo even support 2133 RAM?
1) Do you have HDD or SSD?
I have 2x SSD system, where O.SYSTEM and TEMP FILES is on first 6Gbps SSD, but video files are on second 6gbps SSD.
2) Did you confirm ram speed by some propper software? Does it really work at that frequency? When overclock do not work (too high settings) system boots from "old" (stock) settings, but in BIOS it is sometimes still shown as it works overclocked. If you experienced NO difference something could be wrong. Overclocking CPU in my system ALWAYS show bigger performance, because whole system works faster. Overclocking GPU showed improvement only if RAM was not limiting factor. So with 1600 ram no difference, with 1866 some, with 2133 more.
3) Overclocking RAM on GPU boasts more fps than GPU engine itself. At least with my nVidia.
4) I don't know how modern AMD arhitecture works, because I left that team since I'm in video. I still have office AMD computers, where they shine. But for video AMD is not proper decision, as your 8-core results show. Your score is only a fraction better as my old Q9550 was able ( (oced a little).
5) On my new machine (6-core Intel X79) my GPU now makes only 7,6 fps, which half what did on my previous Z68 platform. Now NeatVideo does not use GPU anymore if I overclock CPU. Only at stock frequencies it uses GPU. But overal I'm stuck at 10-11fps either way. I'm crying to :(
Please run AIDA64 RAM benchmark tests and post results. Than run SiSOFTWARE SANDRA 2012 benchamrak and post results (GB/s). At AIDA64, beside results there will be written frequency and timings for ram, too. You can check current settings. My ram is CL11 BTW.
6) Does your mobo even support 2133 RAM?
a way to predict
I did some math calculation to try to predict the GPUs performance in this topic:
http://www.neatvideo.com/nvforum/viewtopic.php?t=839
I was trying to predict the benefits from the new GTX650ti and I found some interesting results.
http://www.neatvideo.com/nvforum/viewtopic.php?t=839
I was trying to predict the benefits from the new GTX650ti and I found some interesting results.
3570k + GTX650Ti benchmark
Ivy Bridge 3570k OC 4.2GHZ Turbo Off + Gigabyte GTX650Ti 2GB 1033MHZ
Works great, no issues.
Real World test rendering footage (Radius 2, 1920x1080p):
Cpu only 4 cores = 3.51 fps
Cpu 4 cores + GTX650Ti 2GB = 4.54 fps
1440 frames footage (1 minute 24p footage) Cineform 422 I frames 1920x1080p file
Cpu only 4 cores render time = 6min50sec
Cpu 4 cores + GT650Ti 2GB 1033MHZ render time = 5min17sec
(1.03 fps increase) 1.273x speed increase compared to cpu only
The real world performance is slower than Neatvideo benchmark, but is pretty good.
I chose Radius 2 to do the tests because it is the best results for my denoise from GH2 footage
Benchmarks from Neatvideo Optimize tool (shows more fps than real render):
One curious observation: the benchmarks for Cpu only 2 cores and Cpu 2 cores + Gpu is very close to the real render speed using cpu 4 cores only and cpu 4 cores + gpu.
Frame: 1920x1080 progressive, 8 bits per channel, Radius: 2 frames
Running the test data set on up to 4 CPU cores and on up to 1 GPU
CPU only (1 core): 1.75 frames/sec
CPU only (2 cores): 3.55 frames/sec
CPU only (3 cores): 4.98 frames/sec
CPU only (4 cores): 6.02 frames/sec
GPU only (GeForce GTX 650 Ti): 3.86 frames/sec
CPU (1 core) and GPU (GeForce GTX 650 Ti): 3.66 frames/sec
CPU (2 cores) and GPU (GeForce GTX 650 Ti): 4.52 frames/sec
CPU (3 cores) and GPU (GeForce GTX 650 Ti): 6.06 frames/sec
CPU (4 cores) and GPU (GeForce GTX 650 Ti): 7.25 frames/sec
Best combination: CPU (4 cores) and GPU (GeForce GTX 650 Ti)
Frame: 1920x1080 progressive, 16 bits per channel, Radius: 2 frames
Running the test data set on up to 4 CPU cores and on up to 1 GPU
CPU only (1 core): 1.43 frames/sec
CPU only (2 cores): 3.03 frames/sec
CPU only (3 cores): 4.22 frames/sec
CPU only (4 cores): 4.93 frames/sec
GPU only (GeForce GTX 650 Ti): 3.73 frames/sec
CPU (1 core) and GPU (GeForce GTX 650 Ti): 2.71 frames/sec
CPU (2 cores) and GPU (GeForce GTX 650 Ti): 4.37 frames/sec
CPU (3 cores) and GPU (GeForce GTX 650 Ti): 5.59 frames/sec
CPU (4 cores) and GPU (GeForce GTX 650 Ti): 6.33 frames/sec
Best combination: CPU (4 cores) and GPU (GeForce GTX 650 Ti)
Frame: 1920x1080 progressive, 32 bits per channel, Radius: 2 frames
Running the test data set on up to 4 CPU cores and on up to 1 GPU
CPU only (1 core): 1.73 frames/sec
CPU only (2 cores): 3.45 frames/sec
CPU only (3 cores): 4.81 frames/sec
CPU only (4 cores): 5.59 frames/sec
GPU only (GeForce GTX 650 Ti): 3.76 frames/sec
CPU (1 core) and GPU (GeForce GTX 650 Ti): 3.28 frames/sec
CPU (2 cores) and GPU (GeForce GTX 650 Ti): 4.65 frames/sec
CPU (3 cores) and GPU (GeForce GTX 650 Ti): 6.29 frames/sec
CPU (4 cores) and GPU (GeForce GTX 650 Ti): 6.9 frames/sec
Best combination: CPU (4 cores) and GPU (GeForce GTX 650 Ti)
Works great, no issues.
Real World test rendering footage (Radius 2, 1920x1080p):
Cpu only 4 cores = 3.51 fps
Cpu 4 cores + GTX650Ti 2GB = 4.54 fps
1440 frames footage (1 minute 24p footage) Cineform 422 I frames 1920x1080p file
Cpu only 4 cores render time = 6min50sec
Cpu 4 cores + GT650Ti 2GB 1033MHZ render time = 5min17sec
(1.03 fps increase) 1.273x speed increase compared to cpu only
The real world performance is slower than Neatvideo benchmark, but is pretty good.
I chose Radius 2 to do the tests because it is the best results for my denoise from GH2 footage
Benchmarks from Neatvideo Optimize tool (shows more fps than real render):
One curious observation: the benchmarks for Cpu only 2 cores and Cpu 2 cores + Gpu is very close to the real render speed using cpu 4 cores only and cpu 4 cores + gpu.
Frame: 1920x1080 progressive, 8 bits per channel, Radius: 2 frames
Running the test data set on up to 4 CPU cores and on up to 1 GPU
CPU only (1 core): 1.75 frames/sec
CPU only (2 cores): 3.55 frames/sec
CPU only (3 cores): 4.98 frames/sec
CPU only (4 cores): 6.02 frames/sec
GPU only (GeForce GTX 650 Ti): 3.86 frames/sec
CPU (1 core) and GPU (GeForce GTX 650 Ti): 3.66 frames/sec
CPU (2 cores) and GPU (GeForce GTX 650 Ti): 4.52 frames/sec
CPU (3 cores) and GPU (GeForce GTX 650 Ti): 6.06 frames/sec
CPU (4 cores) and GPU (GeForce GTX 650 Ti): 7.25 frames/sec
Best combination: CPU (4 cores) and GPU (GeForce GTX 650 Ti)
Frame: 1920x1080 progressive, 16 bits per channel, Radius: 2 frames
Running the test data set on up to 4 CPU cores and on up to 1 GPU
CPU only (1 core): 1.43 frames/sec
CPU only (2 cores): 3.03 frames/sec
CPU only (3 cores): 4.22 frames/sec
CPU only (4 cores): 4.93 frames/sec
GPU only (GeForce GTX 650 Ti): 3.73 frames/sec
CPU (1 core) and GPU (GeForce GTX 650 Ti): 2.71 frames/sec
CPU (2 cores) and GPU (GeForce GTX 650 Ti): 4.37 frames/sec
CPU (3 cores) and GPU (GeForce GTX 650 Ti): 5.59 frames/sec
CPU (4 cores) and GPU (GeForce GTX 650 Ti): 6.33 frames/sec
Best combination: CPU (4 cores) and GPU (GeForce GTX 650 Ti)
Frame: 1920x1080 progressive, 32 bits per channel, Radius: 2 frames
Running the test data set on up to 4 CPU cores and on up to 1 GPU
CPU only (1 core): 1.73 frames/sec
CPU only (2 cores): 3.45 frames/sec
CPU only (3 cores): 4.81 frames/sec
CPU only (4 cores): 5.59 frames/sec
GPU only (GeForce GTX 650 Ti): 3.76 frames/sec
CPU (1 core) and GPU (GeForce GTX 650 Ti): 3.28 frames/sec
CPU (2 cores) and GPU (GeForce GTX 650 Ti): 4.65 frames/sec
CPU (3 cores) and GPU (GeForce GTX 650 Ti): 6.29 frames/sec
CPU (4 cores) and GPU (GeForce GTX 650 Ti): 6.9 frames/sec
Best combination: CPU (4 cores) and GPU (GeForce GTX 650 Ti)
Last edited by apefos on Sat Mar 02, 2013 12:17 am, edited 1 time in total.
FCPX on a 2011 MacBook Pro 17"
On my late 2011 MacBook Pro 17" running Neat Video 3.3 in FCPX 10.8 and OS 10.8.3 the Radeon HD 6770M takes so long to spin up it looks it actually is slower to use than CPU only. The progress pie takes a while to show its first wedges and then speeds up so maybe it would work faster on a stream rather than in the benchmark? I had hoped GPU acceleration would be a bigger payoff than it appears to be, but this is actually a cost. I found 50% GPU memory (half of the 1GB available) to be the fastest setting. Changing the radius and bit depth did not alter this equation.
Please let me know if there's anything I'm doing wrong or if that's just the way it is.
Please let me know if there's anything I'm doing wrong or if that's just the way it is.
Code: Select all
Frame: 1920x1080 progressive, 32 bits per channel, Radius: 1 frame
Running the test data set on up to 8 CPU cores and on up to 1 GPU
CPU only (1 core): 1.59 frames/sec
CPU only (2 cores): 3.11 frames/sec
CPU only (3 cores): 4.17 frames/sec
CPU only (4 cores): 4.61 frames/sec
CPU only (5 cores): 4.61 frames/sec
CPU only (6 cores): 4.5 frames/sec
CPU only (7 cores): 4.31 frames/sec
CPU only (8 cores): 4.15 frames/sec
GPU only (ATI Radeon HD 6770M): 1.73 frames/sec
CPU (1 core) and GPU (ATI Radeon HD 6770M): 1.52 frames/sec
CPU (2 cores) and GPU (ATI Radeon HD 6770M): 2.79 frames/sec
CPU (3 cores) and GPU (ATI Radeon HD 6770M): 3.12 frames/sec
CPU (4 cores) and GPU (ATI Radeon HD 6770M): 3.03 frames/sec
CPU (5 cores) and GPU (ATI Radeon HD 6770M): 3 frames/sec
CPU (6 cores) and GPU (ATI Radeon HD 6770M): 2.99 frames/sec
CPU (7 cores) and GPU (ATI Radeon HD 6770M): 2.9 frames/sec
CPU (8 cores) and GPU (ATI Radeon HD 6770M): 2.82 frames/sec
Best combination: CPU only (4 cores)
With most mobile GPUs, that is currently quite typical that the CPU alone is faster. Setting the GPU memory to 50-70% may help but not much, just because the GPU itself is not really fast enough to beat the CPU.
We continue to work on further optimizations, so the balance of power, CPU vs GPU, may change in the future.
Vlad
We continue to work on further optimizations, so the balance of power, CPU vs GPU, may change in the future.
Vlad