Dictionary in C++ as (Dictionary) in C#

Question

Dictionary in C++ as (Dictionary) in C#

There is a surprisingly fast dictionary in C#, I would like to know if there is such a productive one only in C++ ? I tried it unordered_map, hash_map, map, but the performance is several times lower than that of Dictionary sisharpovsky... P. S: The pair has the form <unsigned int, unsigned int>

38

c++память производительность

Author: Denis, 2013-02-04

Source

4 answers

score 55 · Answer 1

In fact, comparing languages is a thankless thing. There will always be tests in which one of the languages will win over the other, and there will always be people who believe that this test is not relevant and such code will never occur in reality.

However, I would not say that the results of the TS are very unexpected: in .NET indeed, memory allocation is usually faster than in native languages without a custom allocator. And small tests are usually much more they load the allocator more than, say, the mechanisms for calling functions.

The reason for this difference in allocator performance is that C++ objects cannot be moved in memory, which means that the usual memory allocation algorithm (which, as you know, maintains a list of free blocks, and looks for a suitable one when allocating) is quite slow, and, worse, requires a global memory lock (which further worsens the situation in a multithreaded scenario). In addition, the objects in the C++ tends to release as quickly as possible, which leads to an additional load on freeing memory, which also requires a global lock.

In the environment .Now everything is different. Objects are always allocated at the top of heap memory, which means that allocation is no slower than InterlockedIncrement. .NET doesn't need to maintain a list of free blocks because garbage collection compacts heap memory: objects move around, filling up the memory space. "holes".

In addition, the news that the code in C++ may well be slower than similar code in C# is not news for a long time. Here, for example, is a remarkable series of articles about a simple application from native programming masters, and summary by Jeff Atwood:

To get around the performance of the C# version, Raymond had to write his own I / O routines, rewrite the string class, use a custom allocator, and a custom procedure for displaying code pages.

This is confirmed by the benchmark, which is given below: native containers" out of the box " significantly lose to Dotnet, (some) self-written containers win.

Now for the fun part: measurements.

C#:

using System;
using System.Collections.Generic;
using System.Diagnostics;

namespace Sharp
{
    class Program
    {
        static void Main(string[] args)
        {
            var dict = new Dictionary<int, int>();
            int seed = 1;

            var timer = new Stopwatch();
            timer.Start();

            for (int i = 0; i < 10000000; i++)
            {
                seed = 1664525 * seed + 1013904223;
                dict.Add(seed, i);
            }

            timer.Stop();
            Console.WriteLine(
                "elapsed time = {0} ms, dictionary entries count = {1}",
                timer.ElapsedMilliseconds,
                dict.Count);
        }
    }
}

C++:

#include "stdafx.h"
#include <ctime>
#include <map>
#include <iostream>

using namespace std;

int main(int argc, char* argv[])
{
    map<int, int> dict;
    int seed = 1;

    auto begin = clock();

    for (int i = 0; i < 10000000; i++)
    {
        seed = 1664525 * seed + 1013904223;
        dict.insert(make_pair(seed, i));
    }

    auto end = clock();

    double elapsedMs = double(end - begin) * 1000.0 / CLOCKS_PER_SEC;
    cout << "elapsed time = " << elapsedMs
         << " ms, dictionary entries count = " << dict.size()
         << endl;

    return 0;
}

Measurement results (release mode, 5 consecutive runs without a debugger):

C#

Elapsed time = 1138 ms, dictionary entries count = 10000000
elapsed time = 1127 ms, dictionary entries count = 10000000
elapsed time = 1133 ms, dictionary entries count = 10000000
elapsed time = 1134 ms, dictionary entries count = 10000000
elapsed time = 1129 ms, dictionary entries count = 10000000

C++

Elapsed time = 8377 ms, dictionary entries count = 10000000
elapsed time = 8408 ms, dictionary entries count = 10000000
elapsed time = 8377 ms, dictionary entries count = 10000000
elapsed time = 8377 ms, dictionary entries count = 10000000
elapsed time = 8361 ms, dictionary entries count = 10000000

Average time: C# = 1132 ms, C++ = 8379 ms.

I'm not saying that my tests are perfect. In addition, they are only relevant on my computer. If someone suggests a better measurement technique, I would be happy to apply it too. However, under my conditions, the performance of System.Collections.Generic.Dictionary on adding elements is significantly exceeds the performance of std::map.

Note that Dictionary uses hash tables, while std::map in my implementation uses a red-black tree as the carrier data structure. Hash tables are usually faster on their own, so allocation speed isn't the only reason Dictionary has better speed.

Replacing in C++ make_pair(seed, i) with pair<int, int>(seed, i) on the advice of @igumnov did not lead to a big difference: 8361/8392/8361/8408/8361/8345.

Replacing in C++ std::map with std::unordered_map on the advice of @Kotik resulted in a significant acceleration: 2230/2230/2230/2230/2246 (average 2233). However, C++ is still almost twice as slow.

Replaced in C++ std::unordered_map with uthash on the advice of @igumnov. The result is slightly worse than std::unordered_map: 2963/2932/2948/2948/2932. Code:

void testUhash()
{
    struct myint
    {
        int key;
        int value;
        UT_hash_handle hh;
    };

    struct myint* dict = NULL;
    int seed = 1;

    auto begin = clock();

    for (int i = 0; i < 10000000; i++)
    {
        seed = 1664525 * seed + 1013904223;
        struct myint* ps = (struct myint*)malloc(sizeof(struct myint));
        ps->key = seed;
        ps->value = i;
        HASH_ADD_INT(dict, key, ps);
    }

    auto end = clock();

    double elapsedMs = double(end - begin) * 1000.0 / CLOCKS_PER_SEC;
    cout << "elapsed time = " << elapsedMs
         << " ms, dictionary entries count = " << HASH_COUNT(dict)
         << endl;
}

Added capacity = 10000000 in C++ and for fair comparison in C# too. Changes:

C++:

unordered_map<int, int> dict(10000000);

C#:

var dict = new Dictionary<int, int>(10000000);

Indeed, it has become faster:

C++: 1826/1856/1857/1841/1825, average 1841
C#: 790/786/801/790/791, average 792

Still C# is more than twice ahead.

On the advice of @KoVadim removed the calculation seed (capacity left), now the working cycle such is:

C++:

for (int i = 0; i < 10000000; i++)
{
    //seed = 1664525 * seed + 1013904223;
    dict.insert(pair<int, int>(i, i));
}

C#:

for (int i = 0; i < 10000000; i++)
{
    //seed = 1664525 * seed + 1013904223;
    dict.Add(i, i);
}

Results:

C++: 1498/1514/1498/1498/1498, average 1501
C#: 129/129/135/133/132, average 132

On the advice of @igumnov added khash to the benchmark. Code:

KHASH_MAP_INIT_INT(32, int)
void testKhash()
{
    int seed = 1;

    khiter_t iter;
    khash_t(32)* dict = kh_init(32);
    int dummy;

    auto begin = clock();

    for (int i = 0; i < 10000000; i++)
    {
        seed = 1664525 * seed + 1013904223;
        iter = kh_put(32, dict, seed, &dummy);
        kh_value(dict, iter) = i;
    }

    auto end = clock();

    double elapsedMs = double(end - begin) * 1000.0 / CLOCKS_PER_SEC;
    cout << "elapsed time = " << elapsedMs
         << " ms, dictionary entries count = " << kh_size(dict)
         << endl;
}

Result: 577/577/608/577/577, average 583, massive vin for native code. Let me remind you that the best result is the standard one .NET-container -- 792 ms.

Who will offer a custom container for .NET?

I tried the implementation for .NET FastDictionary (project MapReduce.NET). It turned out a little slower than the built-in Dictionary: 853/865/842/841/842, average 849.

I tried the net allocation rate to test the hypothesis @Dith: The empty class constructor is run 10 million times. Code:

C#:

static class Allocation
{
    class Foo
    {
    }

    static public void Test()
    {
        const int size = 10000000;
        var timer = new Stopwatch();

        timer.Start();
        for (int i = 0; i < size; i++)
        {
            new Foo();
        }
        timer.Stop();

        Console.WriteLine("elapsed time = {0} ms", timer.ElapsedMilliseconds);
    }
}

C++:

void testAlloc()
{
    const int size = 10000000;

    LARGE_INTEGER li;
    if (!QueryPerformanceFrequency(&li))
        exit(1);

    double freq = double(li.QuadPart)/1000.0;

    QueryPerformanceCounter(&li);
    auto begin = li.QuadPart;

    for (int i = 0; i < size; i++)
        new Foo();

    QueryPerformanceCounter(&li);
    auto end = li.QuadPart;

    double elapsedMs = double(end - begin) / freq;
    cout << "elapsed time = " << elapsedMs
         << " ms" << endl;
}

Results:

C#: 58/54/28/55/55 (average 50)
C++: 407.719/400.693/401.674/401.926/399.976 (average 402.4)

score 5 · Answer 2

Continuation of a very interesting discussion:

As I said @VladD, Dotnet Dictionary is based on HashMap, so

The most unsuccessful implementation of the function GetHashCode in .The NET Framework is the default implementation used in structures. The fact is that this function for structures does the following. It uses reflection to iterate through all the fields and try to get the hash code. After finding the first field from which to get the hash code, the function completes its work by returning this value. In most cases, it returns the hash code of the first field that comes across. However, if the structure consists of reference types, and all of them are set to null, the function returns the ordinal number of the object by analogy with the class .

....

This approach has two drawbacks. First – such a hash code can and often does turn out to be incorrect, since it is difficult to identify the entire structure by a single field. Second, due to the use of reflection this method is far from ideal performance. Therefore, if you need to get hash values from structures, it is better to implement the corresponding functions yourself.

Source

@igumnov About memory measurements, there is a good article " Processing large amounts of data in memory in C#"

score 2 · Answer 3

First, remember that in map, the first parameter is the key type, and the second is the value type. You add a key in each record seed = 1664525 * seed + 1013904223; where seed = 1 is constant. Accordingly, the same key value is passed, which is the worst value to insert.

Secondly, and most importantly. map is efficient for finding a value by key, and very inefficient for adding and removing elements. Your test only does the addition. The main costs of such a process are the test is not an allocation, but an addition in the tree (red-black, as you correctly noticed). If you want to compare dictionnary in dotnet and map in stl, you need to write a test that first fills in the values, and then measures the access time to random values in the loop. If you often need to search for values in the application, we use map, but then we write a different test. If you need to add values often and a lot, we use vector, if you need to add/remove - list. Then the test (for additions) will be different:

vector<int> vec;
for (int i = 0; i < 1000000; i++)
{
    vec.push_back(i);
}

Your code for 1000000 elements is executed in 8386 ms, the example with a vector is 251 ms, 33 times faster.

score 1 · Answer 4

I will write here, since my comment limit is over

Tried the net allocation rate to test the hypothesis @Dith: 10 million times the empty class constructor is run.

This test shows ignorance of the compiler and the runtime environment. In C++, the code will leak, in sharp - no, because the GC will clean up everything. This is the first bell that the tests are not equivalent. Added at least a call to the destructor. (But tests show that it is it won't speed up the process much).

BUT! in sharp code, in fact, the amount of allocated memory will not increase - the object will be destroyed, and the new one will be placed in the old memory. Since the object is empty, initialization will be fast. Plus, the memory is already allocated. Formally, the assignment of a pair of pointers + memset().

But the jit optimizer can see that the object is created, but not used, and the constructor is empty and throw it away. And in fact-to drive an empty loop. Although ideally it can then throw it away. But this needs to be looked at separately.

I'm not that strong in C#, but I think that if you start to output the address of an object that is being created, it will always be the same (or there will be a dozen different addresses that are iterated over in a loop).

That is, in fact, the c++ code looked like this somewhere (very much simplified code, on the knee):

Foo * f; // это глобальная переменная.

for (int i = 0; i < size; i++) {
    // это как бы конструктор
    Foo * t;
    if (f == NULL) // если нет объекта, создадим
       t = alloc_memery();
       init(t);
    else {
       t = f;
       init(t); // а это часть конструктора. Память то уже выделена, нам только поля поправить. 
       f = NULL; // объект мы забрали
    }
    // а это деструктор.
    f = t; // вернули объект назад.
    t = NULL;
}

The result is a pool of objects per object.

But you can do better, which is exactly what it will do the tests are more correct.

char * t = new char[1000]; // выделим себе немножко памяти.
for (int i = 0; i < 10000000; i++) {
    new (t) Foo(); // видите странный оператор new?:)
}
delete[] t;

(the new operator, which in the code is such a special form, is called placement new). The most interesting thing is that now there is no leak, and the code works 10 times faster than the original. I consider this code to be the most plausible analog of the sharpe code. But on the other hand, you can write even more beautiful

for (int i = 0; i<10000000; i++)
   Foo f(); // да, здесь будет создаваться объект и вызываться конструктор-деструктор

I wouldn't be surprised if the sharp optimizer also made the original code allocate memory on the stack (this is possible, I know for sure that 7 java is so can). And allocating memory on the stack is very cheap in terms of speed.

Someone will say that in C# I just took it and wrote it quickly, and in C++ you need to add something else. But in C++, "hacks" are made easily and not forced, and in C# you can quickly run into obstacles (yes, I know about Unsafe code). But I don't think the above code is a hack. And I probably won't use new to create millions of objects in a loop. And if necessary, I will make a preallocation to the desired number of objects. And it will be faster.