v3 Performance?
v3 Performance?
I'm interested in the performance gains of v3. Did anyone benchmark it? I read the promo, but it doesn't make clear how much gain I should expect cleaning 1080p video (max temporal resolution, non adaptive).
I can benchmark the demo using a lower res video, but does resolution scale linear? So double the resolution = double the processing time?
Thanks in advance.
I can benchmark the demo using a lower res video, but does resolution scale linear? So double the resolution = double the processing time?
Thanks in advance.
-
- Posts: 8
- Joined: Wed Dec 08, 2010 6:32 pm
Definite Gains
I'm a long time (about 1 year or so) NEAT video user and fan. I've been very impressed with the v2 quality and results. The big complaint was always the slow timing. v2 never tood advantage of CUDA. We spoke up. The NEAT team answered.
I just now (wish I would have known about the release sooner) downloaded the v2 to v3 upgrade. After updating the settings I ran a quick test on my machine equiped with PPro 5.0.3 and a GTX 470. Processor is i7 930 stock with 12GB of ram. I'm using 4-5 discs if anyone cares.
Where the GPU never kicked in on export with v2 I'm now seeing it shoot up to 50% usage with v3. This is great. As a result, my export times seem to have dropped by 50% or more. I will do more in-depth testing later, but the CUDA feature is a definite winner. Hands down, there is a huge improvement on export time with NEAT applied.
I would say if you like the export quality and hate the export times, then the upgrade is worth it.
I have not done side-by-side quality comparisons.
Good job guys,
Erik
Edit: my source footage is regular ol' HDV.
I just now (wish I would have known about the release sooner) downloaded the v2 to v3 upgrade. After updating the settings I ran a quick test on my machine equiped with PPro 5.0.3 and a GTX 470. Processor is i7 930 stock with 12GB of ram. I'm using 4-5 discs if anyone cares.
Where the GPU never kicked in on export with v2 I'm now seeing it shoot up to 50% usage with v3. This is great. As a result, my export times seem to have dropped by 50% or more. I will do more in-depth testing later, but the CUDA feature is a definite winner. Hands down, there is a huge improvement on export time with NEAT applied.
I would say if you like the export quality and hate the export times, then the upgrade is worth it.
I have not done side-by-side quality comparisons.
Good job guys,
Erik
Edit: my source footage is regular ol' HDV.
Here are some absolute figures from Neat Video's Optimize (frame: 1920x1080p, 8 bits per channel, radius: 1 frame) running on a bit faster cards:
- GPU only (GeForce GTX 470): 6.41 frames/sec
- GPU only (GeForce GTX 580): 10.1 frames/sec
- GPU only (GeForce GTX 590 #1, GeForce GTX 590 #2): 13.5 frames/sec
For comparison, running on i7 2600, 3.4GHz, fast RAM:
- CPU only (4 cores): 6.41 frames/sec
Hope this helps,
Vlad
- GPU only (GeForce GTX 470): 6.41 frames/sec
- GPU only (GeForce GTX 580): 10.1 frames/sec
- GPU only (GeForce GTX 590 #1, GeForce GTX 590 #2): 13.5 frames/sec
For comparison, running on i7 2600, 3.4GHz, fast RAM:
- CPU only (4 cores): 6.41 frames/sec
Hope this helps,
Vlad
Last edited by NVTeam on Tue Oct 11, 2011 5:31 pm, edited 1 time in total.
-
- Posts: 8
- Joined: Wed Dec 08, 2010 6:32 pm
More Accurate Results
jpsdr: indeed that is a very good CPU. Do you have any timing numbers and the GPU model?
For all to review,
I ran some more detailed tests today. Well, as detailed as I could get. Here are some findings (same machine as mentioned before).
v2: I can't find any specifics but if I remember correctly the export times with NEAT applied all the way through would yield 12x the sequence length. So with that estimation, a :30 would take 6:00 to export. I should have run a test prior to the v3 upgrade to get more accurate figures. Maybe someone else can point out my old performance post.
v3 CPU/GPU: :30 clip with NEAT applied all the way through took 3:04.
v3 CPU only: :30 clip with NEAT applied all the way through took 4:52.
The way I read this is that v3 is a great improvement over v2 for my machine. Export times are reduced by 50% using improved CPU utilization plus new CUDA features. I may try some CPU overclocking and maybe even OC the GPU (maybe).
Your mileage may vary...
Best of luck to all
Erik
For all to review,
I ran some more detailed tests today. Well, as detailed as I could get. Here are some findings (same machine as mentioned before).
v2: I can't find any specifics but if I remember correctly the export times with NEAT applied all the way through would yield 12x the sequence length. So with that estimation, a :30 would take 6:00 to export. I should have run a test prior to the v3 upgrade to get more accurate figures. Maybe someone else can point out my old performance post.
v3 CPU/GPU: :30 clip with NEAT applied all the way through took 3:04.
v3 CPU only: :30 clip with NEAT applied all the way through took 4:52.
The way I read this is that v3 is a great improvement over v2 for my machine. Export times are reduced by 50% using improved CPU utilization plus new CUDA features. I may try some CPU overclocking and maybe even OC the GPU (maybe).
Your mileage may vary...
Best of luck to all
Erik
I've build a PC dedicated to video processing, so, i'm absolutely not interested with 3D capacity of Video card.
My standard video card is an ASUS ENGT520 SILENT/DI/1GD3(LP) 1 GB.
Poor bandwith for what i know, but i don't mind, what interest me is the fact this card has (and for now only 520 has) the new VP5 engine, wich give excellent result in decoding h264 video (i'm using neuron2's dgdecodenv for working with blu-ray materials). So, result with this card is realy poor.
I've try, adding a second card : MSI N450GTS-MD2GD3.
My motherboard allow me until 3 cards, so, 1rst card is always x16, and
configurations of 2 others can be x8/x8 or x16/x1. Of course, i've choosen x16/x1, wich allowed me to have both cards in x16.
Best result was also with CPU only.
All configurations tested (card1 only, card2 only, card1 & card2).
If i remember properly, under windows XP, on my i7@980, with VD pluggin :
- Best result were with around 7 or 8 CPU, not 12.
- 576p, radius 2, around 25-30fps (don't remember exactly).
- 1080p, radius 2, around 4-5fps (don't remember exactly).
Vlad, can you provide, for reference, benchmark with a radius of 2 of VD pluggin under windows XP ?
My standard video card is an ASUS ENGT520 SILENT/DI/1GD3(LP) 1 GB.
Poor bandwith for what i know, but i don't mind, what interest me is the fact this card has (and for now only 520 has) the new VP5 engine, wich give excellent result in decoding h264 video (i'm using neuron2's dgdecodenv for working with blu-ray materials). So, result with this card is realy poor.
I've try, adding a second card : MSI N450GTS-MD2GD3.
My motherboard allow me until 3 cards, so, 1rst card is always x16, and
configurations of 2 others can be x8/x8 or x16/x1. Of course, i've choosen x16/x1, wich allowed me to have both cards in x16.
Best result was also with CPU only.
All configurations tested (card1 only, card2 only, card1 & card2).
If i remember properly, under windows XP, on my i7@980, with VD pluggin :
- Best result were with around 7 or 8 CPU, not 12.
- 576p, radius 2, around 25-30fps (don't remember exactly).
- 1080p, radius 2, around 4-5fps (don't remember exactly).
Vlad, can you provide, for reference, benchmark with a radius of 2 of VD pluggin under windows XP ?
Ok, i've remade tests :
Windows XP64 SP2, with VDub64 and 64bit plugin, using benchmark tool provided in NV. Core i7@980, no overclocking, P6T Deluxe/OC palm mother board, DDR3 Kingtsone XMP FSB1800 memory.
radius 2 : 720x576p : best result 8 cores (on 12) with 22.2fps
radius 1 : 720x576p : best result 8 cores (on 12) with 25fps
radius 2 : 1920x1080p : best result 8 cores (on 12) with 4.48fps
radius 1 : 1920x1080p : best result 8 cores (on 12) with 5.1fps
... What's surprise me is the fact that with an i7 2600 4 cores, Vlad get better results with CPU only...
Vlad, can you somehow do the same tests with the exact configuration (except for CPU) than me ? => XP62 SP2 + VDub64 + plugin 64 bits ?
Unless it's a know fact that i7 2600 3.4GHz is realy better than an i7@980 3.33GHz, there is realy something odd.
Could problem be :
- Cause by OS ?
- Cause by plugin version (Vdub) ?
- ..... ..... .... ..... .... what else... ... ... "bad" handling of hyperthreading ?
As far as i know, from benchmark i see on Doom9, i've excelent x264 speed, so, in the 1rst time, i would exclude something from the HW.
Windows XP64 SP2, with VDub64 and 64bit plugin, using benchmark tool provided in NV. Core i7@980, no overclocking, P6T Deluxe/OC palm mother board, DDR3 Kingtsone XMP FSB1800 memory.
radius 2 : 720x576p : best result 8 cores (on 12) with 22.2fps
radius 1 : 720x576p : best result 8 cores (on 12) with 25fps
radius 2 : 1920x1080p : best result 8 cores (on 12) with 4.48fps
radius 1 : 1920x1080p : best result 8 cores (on 12) with 5.1fps
... What's surprise me is the fact that with an i7 2600 4 cores, Vlad get better results with CPU only...
Vlad, can you somehow do the same tests with the exact configuration (except for CPU) than me ? => XP62 SP2 + VDub64 + plugin 64 bits ?
Unless it's a know fact that i7 2600 3.4GHz is realy better than an i7@980 3.33GHz, there is realy something odd.
Could problem be :
- Cause by OS ?
- Cause by plugin version (Vdub) ?
- ..... ..... .... ..... .... what else... ... ... "bad" handling of hyperthreading ?
As far as i know, from benchmark i see on Doom9, i've excelent x264 speed, so, in the 1rst time, i would exclude something from the HW.
Sorry, cannot check in XP-64. In Win7-64, on i7 2600 3.4GHz using the current build of the v3 plug-in from the website:
1920x1080p:
radius 1: 7 cores, 5.85 frames/sec
radius 2: 7 cores, 5.26 frames/sec
720x576p
radius 1: 5 cores, 28.6 frames/sec
radius 2: 6 cores, 25.6 frames/sec
I double-checked the Max Turbo Frequency setup, it turns out that earlier, the i7 2600 automatically switched to 3.8GHz under load, which explains somewhat higher figure earlier (6.41 fps earlier vs 5.85 fps now). I have now fixed its speed at 3.4GHz to do these measurements for you.
Hope this helps,
Vlad
1920x1080p:
radius 1: 7 cores, 5.85 frames/sec
radius 2: 7 cores, 5.26 frames/sec
720x576p
radius 1: 5 cores, 28.6 frames/sec
radius 2: 6 cores, 25.6 frames/sec
I double-checked the Max Turbo Frequency setup, it turns out that earlier, the i7 2600 automatically switched to 3.8GHz under load, which explains somewhat higher figure earlier (6.41 fps earlier vs 5.85 fps now). I have now fixed its speed at 3.4GHz to do these measurements for you.
Hope this helps,
Vlad
Re: More Accurate Results
so there's a 19% improvement for ATI users. Thanks for benchmarking.Pyramid Pyro wrote:with that estimation, a :30 would take 6:00 to export.
v3 CPU only: :30 clip with NEAT applied all the way through took 4:52.
Really hope for OpenCL support. I have a i7 920 (12GB) with a HD6950 (2GB). I'm sure GPU acceleration would help quite a bit.
-
- Posts: 56
- Joined: Fri May 28, 2010 8:51 pm
Speed depends on frame size (and filter settings) and for certain sizes realtime may be achievable with multiple cards only.
Anyway, I recommend to take a look at this topic for additional information about performance of different cards.
Vlad
Anyway, I recommend to take a look at this topic for additional information about performance of different cards.
Vlad