FCPX GTX 980 slower than GTX 660 TI

resolve technical issues related to use of Neat Video
Post Reply
tommy_99
Posts: 5
Joined: Tue Apr 05, 2016 1:48 pm

FCPX GTX 980 slower than GTX 660 TI

Post by tommy_99 » Fri May 27, 2016 5:16 am

Hy,

im running an E-Gpu with an ViDock enclosure over 1x PCIE lane on my Macbook Pro.

Since NeatVideo is not supporting my Internal Intel Iris Pro 5200 i got this enclosure together with GTX 660 TI to speed up rendering on FCPX.
It was run stable for about 1 Week.

I hoped to get better results with an GTX 980 Card, so i upgraded, but now i got much worse results.

Log:

Frame Size: 1280x720 progressive
Bitdepth: 32 bits per channel
Quality Mode: Normal
Mix with Original: Disabled
Temporal Filter: Enabled
Radius: 5 frames
Dust and Scratches: Aggressive
Slow Shutter: Disabled
Spatial Filter: Enabled
Frequencies: High, Mid, Low
Artifact Removal: Enabled
Edge Smoothing: Disabled
Sharpening: Disabled
Neat Video 4.1.4 Pro plug-in for Final Cut

Detecting the best combination of performance settings:
running the test data set on up to 8 CPU cores and on up to 1 GPU
GeForce GTX 980: 3629 MB currently available (4095 MB total), using up to 100%

CPU only (1 core): 1.45 frames/sec
CPU only (2 cores): 3.34 frames/sec
CPU only (3 cores): 4.67 frames/sec
CPU only (4 cores): 5.75 frames/sec
CPU only (5 cores): 5.81 frames/sec
CPU only (6 cores): 5.99 frames/sec
CPU only (7 cores): 6.02 frames/sec
CPU only (8 cores): 6.06 frames/sec
GPU only (GeForce GTX 980): 1.46 frames/sec
CPU (1 core) and GPU (GeForce GTX 980): 1.35 frames/sec
CPU (2 cores) and GPU (GeForce GTX 980): 2.82 frames/sec
CPU (3 cores) and GPU (GeForce GTX 980): 3.85 frames/sec
CPU (4 cores) and GPU (GeForce GTX 980): 4.08 frames/sec
CPU (5 cores) and GPU (GeForce GTX 980): 4.12 frames/sec
CPU (6 cores) and GPU (GeForce GTX 980): 4.1 frames/sec
CPU (7 cores) and GPU (GeForce GTX 980): 4.12 frames/sec
CPU (8 cores) and GPU (GeForce GTX 980): 4.13 frames/sec

Best combination: CPU only (8 cores)

With GTX 660 TI and same Nvidia Webdrivers and same Cuda Drivers i got whit 5 Cores + EGPU about 6,5 frames/sec.

Webdrivers Version: 10.11.10 (346.03.10f02)
Cuda Version: 7.5.29.

Do i forgot something to setup or are the 6xx Cards faster on NeatVideo than the 9xx?

Best regards
Tom

NVTeam
Posts: 2261
Joined: Thu Sep 01, 2005 4:12 pm
Contact:

Post by NVTeam » Fri May 27, 2016 10:34 am

Perhaps the new GPU works at reduced clock rate to save energy?

Please use diagnostic tools (perhaps something like GPU-Z, CUDA-Z) to verify the actual clock rates it runs at.

Vlad
Image Image Neat Video team
noise reduction for video and photos

tommy_99
Posts: 5
Joined: Tue Apr 05, 2016 1:48 pm

Post by tommy_99 » Fri May 27, 2016 1:25 pm

Looks like it works as it should:

CUDA-Z Report
=============
Version: 0.10.251 64 bit http://cuda-z.sf.net/
OS Version: Mac OS X 10.11.5 15F34
Driver Version: 10.10.10 310.42.25f01
Driver Dll Version: 7.50
Runtime Dll Version: 6.50

Core Information
----------------
Name: GeForce GTX 980
Compute Capability: 5.2
Clock Rate: 1215.5 MHz
PCI Location: 0:10:0
Multiprocessors: 16 (2048 Cores)
Threads Per Multiproc.: 2048
Warp Size: 32
Regs Per Block: 65536
Threads Per Block: 1024
Threads Dimensions: 1024 x 1024 x 64
Grid Dimensions: 2147483647 x 65535 x 65535
Watchdog Enabled: Yes
Integrated GPU: No
Concurrent Kernels: Yes
Compute Mode: Default
Stream Priorities: Yes

Memory Information
------------------
Total Global: 4095.81 MiB
Bus Width: 256 bits
Clock Rate: 3505 MHz
Error Correction: No
L2 Cache Size: 48 KiB
Shared Per Block: 48 KiB
Pitch: 2048 MiB
Total Constant: 64 KiB
Texture Alignment: 512 B
Texture 1D Size: 65536
Texture 2D Size: 65536 x 65536
Texture 3D Size: 4096 x 4096 x 4096
GPU Overlap: Yes
Map Host Memory: Yes
Unified Addressing: Yes
Async Engine: Yes, Bidirectional

Performance Information
-----------------------
Memory Copy
Host Pinned to Device: 357.941 MiB/s
Host Pageable to Device: 356.242 MiB/s
Device to Host Pinned: 399.623 MiB/s
Device to Host Pageable: 397.286 MiB/s
Device to Device: 72.8112 GiB/s
GPU Core Performance
Single-precision Float: 4792.72 Gflop/s
Double-precision Float: 155.622 Gflop/s
64-bit Integer: 326.568 Giop/s
32-bit Integer: 1424.41 Giop/s
24-bit Integer: 1059.84 Giop/s

Generated: Fri May 27 15:19:46 2016

For me except the Memory Copy (1x PCIE) everything looks normal.
The same Memory Copy results i got from 660 TI to, so no difference here.

NVTeam
Posts: 2261
Joined: Thu Sep 01, 2005 4:12 pm
Contact:

Post by NVTeam » Fri May 27, 2016 4:45 pm

These seem to be the causes of the reduced speed:
Host Pinned to Device: 357.941 MiB/s
Host Pageable to Device: 356.242 MiB/s
Device to Host Pinned: 399.623 MiB/s
ViDock enclosure over 1x PCIE lane

Vlad
Image Image Neat Video team
noise reduction for video and photos

tommy_99
Posts: 5
Joined: Tue Apr 05, 2016 1:48 pm

Post by tommy_99 » Sat May 28, 2016 5:26 am

To bad,
that on same components the Maxwell GPU is slower than Kepler GPU, beside the fact that both cards has the same limitation as E-GPU in Memory Copy / bandwidth (1x PCIE Lane).

Seems that either the Nvidia driver is better optimized for Kepler and use less bandwidth (but officially no Kompression is used in OSX) or the CUDA Code from NeatVideo for Kepler ist better in terms of bandwidth consumption.

Post Reply