I got new computer with 8 cores

general questions about Neat Video
Post Reply
Lugarimo
Posts: 114
Joined: Mon Feb 09, 2009 2:51 pm

I got new computer with 8 cores

Post by Lugarimo »

I did performance optimize thing and its faster when it use 6 cores than 8. Why?
NVTeam
Posts: 2745
Joined: Thu Sep 01, 2005 4:12 pm
Contact:

Post by NVTeam »

Having more cars on the road does not always lead to higher throughput. Instead, that often leads to traffic jams..

The "road" of your computer is good for six "cars". Eight will do too, but they will not have enough space to go without waiting.

Vlad
Lugarimo
Posts: 114
Joined: Mon Feb 09, 2009 2:51 pm

Post by Lugarimo »

But why jams? You maybe cache 80 frames and have each core denoise 10, 13 with 6, so where does jams come from?
NVTeam
Posts: 2745
Joined: Thu Sep 01, 2005 4:12 pm
Contact:

Post by NVTeam »

Getting many frames is exactly what causes jams because memory bandwidth is a bottleneck. With a faster memory, 8 cores could be faster than 6.

Vlad
Lugarimo
Posts: 114
Joined: Mon Feb 09, 2009 2:51 pm

Post by Lugarimo »

But this happens with very small 160p videos too. Are you sure its a memory problem?
NVTeam
Posts: 2745
Joined: Thu Sep 01, 2005 4:12 pm
Contact:

Post by NVTeam »

I think so. If you have any alternative ideas please let me know.

Vlad
jpsdr
Posts: 221
Joined: Mon Aug 11, 2008 7:33 am

Post by jpsdr »

I have a better way to explain how and why parallel/multi-core can increase speed, at least in the beginning, but, in some cases, after will have no effect.

You want to remove the tires of a car.
Having 2 peoples instead of one, will increase speed. Even better if each has his own set of tools, instead of only one set they pass to each other (this can represent some bandwidth bottleneck). You can increase speed having 4 peoples with their own set of tools, each one on each wheel. But... adding a 5th or more will not help at all, and can even finaly lead to mess (too much people in the kitchen when cooking will cause more harm than good).
So, while dealing with cars, best is 4. The day you'll deal with 8 wheels trucks, you'll be able to use up to 8 peoples.

I think algorithm/programs follow somehow this same kind of rules.
It's linked to several things : How algorithm works, how multi-threading is implemented, etc, etc, etc...
NVTeam
Posts: 2745
Joined: Thu Sep 01, 2005 4:12 pm
Contact:

Post by NVTeam »

Multi-threading is fine, there is no problem there. The problem is that CPU can process data faster than the memory can serve it to the CPU. The CPU doesn't receive the data fast enough, because of the limited bandwidth of the memory system. If you install faster memory modules then you will see how the optimum number of cores increases.

Vlad
Lugarimo
Posts: 114
Joined: Mon Feb 09, 2009 2:51 pm

Post by Lugarimo »

jpsdr that doesn't explain me because a car has only 4 tires but a video has 100,000 frames or more so theres lots of opportunity for more cores.

NVTeam, I have 4GB memory but since I have XP its only 3. With 240x160 video it would be 115KB per frame and 2.7MB per second. With 6 cores it processes this at 55 fps and 52 with 8. So youre saying my memory can only process 6 MB per sec? Thats pretty low. Benchmarks said it goes 6 GB per sec.
NVTeam
Posts: 2745
Joined: Thu Sep 01, 2005 4:12 pm
Contact:

Post by NVTeam »

Your calculations take into account only the size of one input frame read (once) from the memory per one output frame produced (which btw needs to be saved too). There are much more data used (read and written) to process that one input frame and all that data require space and bandwidth.

Vlad
Lugarimo
Posts: 114
Joined: Mon Feb 09, 2009 2:51 pm

Post by Lugarimo »

Ok but it's true that lower the frame size the less memory need be used, right? If so why did I get this on testing a 64x64 frame:

CPU only (1 core): 500 frames/sec
CPU only (2 cores): 76.9 frames/sec
CPU only (3 cores): 62.5 frames/sec
CPU only (4 cores): 62.5 frames/sec
CPU only (5 cores): 66.7 frames/sec
CPU only (6 cores): 62.5 frames/sec
CPU only (7 cores): 62.5 frames/sec
CPU only (8 cores): 58.8 frames/sec
NVTeam
Posts: 2745
Joined: Thu Sep 01, 2005 4:12 pm
Contact:

Post by NVTeam »

Measurements become less accurate with smaller frames, more organizational overhead, etc.

Do you have a better theory than the bandwidth limitation one?

Vlad
jpsdr
Posts: 221
Joined: Mon Aug 11, 2008 7:33 am

Post by jpsdr »

You clearly don't see my point. First, it was a generic way/example to explain why sometimes more cores may not always means more speed.
And why do you think NV process one frame by core ?
I don't know how NV is programmed, but most often, when you multi-thread, it's the algorithm you split (several cores working on small parts of pictures), instead of one core by frames. At least, it's what's often done in image processing algorithm. One of the interest of doing this, is to reduce cache misses.
Compared to my example : 1 frame = 1 car.
Most often, you make several people working on the same car, they may need "less" resources (one big set of tools will be enough for everyone), and according what they are doing, there is a limit in efficiency of parallelism.
Now, you can have only one person working on a car, but having several cars on the same times. You understand you'll need more resources (more space to stores the cars (more memory), and more toolboxes).
This is why i think (but it's only personal point of view) : Trying to do 1 core=1 frame is the worst thing to do.

But very small frame size show an interesting thing : The difference between 1 and 2+, from my point of view, clearly show a difference in cache use. Maybe with 1 core, everything stay in level2 cache (maybe even level1), with 2+, one level of cache is dropping. It's the only explanation i can think of (but it doesn't mean it's the accurate one).

All of this to say : More cores will not always means more speed even on a multi-threaded program. There's a lot of reasons which can explain that, this is what i've tried to explain with my examples.
NVTeam
Posts: 2745
Joined: Thu Sep 01, 2005 4:12 pm
Contact:

Post by NVTeam »

> And why do you think NV process one frame by core ?
I don't, it doesn't.

> very small frame size show an interesting thing
Assuming the speed measurements are accurate. However they are not. For very small frames, the measurement become much less accurate, so we cannot make any conclusions based on those figures.
To explain why 6 cores may be faster than 8 you really need to measure with normal frames, at least SD, better HD. The larger the frame, the more accurate are those speed figures.

Taking your analogy: "one big set of tools will be enough for everyone" but it may not be enough for all of them at once, should they need to do the same operation on different wheels simultaneously. Another aspect is that those persons need some place for work too. If there are too many of them around one car, they will hinder each other. And each of them may not even have a free wheel to work on.

> More cores will not always means more speed even on a multi-threaded program.
Agreed. Possible reasons of that are many. One of them, significant for large frames, is the limited memory bandwidth. Modern multi-core CPUs can process data faster than the memory system can deliver that data to the CPU.
Cache size is important too. If fewer cores do the work then they can use larger portions of the shared cache memory, which speeds up memory operations for those active cores. But fewer cores also means lower overall computing speed. These two tendencies compete with each other and as a result the highest speed is achieved somewhere in the middle. Generally faster memory can change the optimum. Larger cache size of the CPU can change it too. Which is why it is important to use a fast RAM and a high-end CPU with large cache memory.

Vlad
jpsdr
Posts: 221
Joined: Mon Aug 11, 2008 7:33 am

Post by jpsdr »

NVTeam wrote:> And why do you think NV process one frame by core ?
I don't, it doesn't.
I was responding to Lugarimo for this specific point...
I have no doubt you know how NV works... :wink:

I must said, for now, the best multi-threaded program i've ever seen is x264. When i've upgraded from an i7@960 to i7@980, i've increased my speed even a little more than the theorical ratio 6x3.33/4x3.2.
The only thing i can think of is the increase of level3 cache (8M->12M).
I've been bluffed on this one ! :shock:
Post Reply