How does performance from CPU cache work?

Recently I found that it is possible to get immense performance when using CPU cache. One example I saw was a program that reduced its runtime from 10 seconds to 200 milliseconds just using this concept.

How does this performance achievement work?

Author: Maniero, 2019-10-06

1 answers

For the programmer this does not matter, even more if it does not use languages that allow too large control of memory, even those can not make use of the cache directly, it is the prerogative of the processor to take care of this. What you can do is use objects in a certain way so that it is more likely to stay in the cache, which is a difficult thing to do and does not pay off in most applications.

For the data to be used quickly it needs to be in the logger, but neither everything can be there, there are few. So there's a memory nearby that has quick access as well, but there's a process to pick up the information that comes at a cost. It can not be all one thing just because the distance would take the data to respond, after all the way to go is greater. And if everything were at this distance everything would be slower, then the processor territorializes the areas according to the physical distance.

On modern processors there are usually some then there is a slightly larger memory near this last memory, which is a little further away and therefore slower. Then there is another level near this a little further that has a higher capacity and is a little slower. It can have up to another level, but it does not usually compensate, they have already tried and given up.

Then has the RAM that is no longer in the processor, and can have other intermediate forms. The closer to where it is processed and the simpler the faster mechanism is. RAM does not cease to be a cache, but the question focuses on the processor.

In the case of the processor is all transparent, it will put closer to the register what is most used and what is most likely to be used at that time. Instead of accessing a slower part it can access the fastest part, this is the cache and this is what gives more speed.

Has several techniques to facilitate this and are not always intuitive, so you can not say it has a simple and reliable recipe. In general the ideal is to have better reference locality (that no one gave a good answer, if not scroll now, I'll post something).

The term CPU here is not good because it has no cache, only the processor has it.

May be useful: what makes cache invalidation a difficult solution?.

 4
Author: Maniero, 2019-10-07 11:52:20