What can cause the EF performance drop in this scenario?

I've been doing a data import for a project in ASP.NET MVC 5 with EF 6.1 and SQL Server 2008 where the import data source was a txt file.

The file record lines were not very large but contained a certain amount of information to process. As for example, in the registration line there was the data of an employee, a position in the company, neighborhood, city, etc. Everything becoming a table in my model.

Then, line by line I checked the existence of each neighborhood, city, position and all other data to obtain the identification of those who were already registered or else record the not existing yet.

Later I chose to cache. I simply loaded the data into generic lists into memory and before going into the database I checked this information in memory.

The other information did not generate a large number of data, but the employee registration yes. There were more than 27 thousand records.

Anyway, my question is about a performance problem I had when doing the import using EF. After the first thousand registrations (a little more) the import performance drops a lot, registering an employee every 1 or 2 seconds.

Until I realized that by canceling the import process and starting again the first thousand records were imported in a good time, but then of the thousand records the thing went slowly again.

I solved by destroying my context every thousand records and creating again. Dai then the import had a good performance from start to finish.

I would like to know from you who have been through something similar, why does this occur? Could it be some Entity Framework configuration? mine is out of settings.

Follow the code snippet of the import for anyone who wants parse:

private List<Domain.int> Matriculas;
private List<Domain.CentroCusto> CacheCentrosCusto;
private List<Domain.Local> CacheLocais;
private List<Domain.Profissao> CacheProfissoes;
private List<Domain.Cidade> CacheCidades;
private List<Domain.Bairro> CacheBairros;
//Funcionarios
public ActionResult Funcionarios(string filePath)
{
    var fileInfo = new FileInfo(filePath);
    if (!file.Exists)
        throw new Exception("Arquivo não encontrado");

    Matriculas = contexto.Funcionarios
        .Select(x => x.Matricula)
        .ToList();

    CacheCentrosCusto = contexto.CentrosCusto.ToList();
    CacheLocais = contexto.Locais.ToList();
    CacheProfissoes = contexto.Profissoes.ToList();
    CacheCidades = contexto.Cidades.ToList();
    CacheBairros = contexto.Bairros.ToList();

    var file = new System.IO.StreamReader(filePath);
    try
    {
        var count = 0;
        string line = "";
        while ((line = file.ReadLine()) != null)
        {
            string[] campos = line.Split(';');

            int matricula;
            if (int.TryParse(campos[0].Trim(), out matricula))
            {
                if (Matriculas.Contains(matricula))
                    continue;
            }

            var funcionario = new Domain.Funcionario();

            funcionario.Matricula = matricula;
            // obtendo outros dados ...

            funcionario.CentroCustoId = GetCentroCustoId(campos[34].Trim());
            funcionario.LocalId = GetLocalId(campos[35].Trim());
            funcionario.ProfissaoId = GetProfissaoId(campos[36].Trim());

            //Cidade e Bairro
            funcionario.BairroId = null;

            var bai_desc = campos[11].Trim();
            if (!string.IsNullOrEmpty(bai_desc))
            {
                var cid_uf = "";
                var cid_desc = campos[12].Trim();

                // trabalho a string ...

                var cidade = new Domain.Cidade();
                cidade.Nome = cid_desc;
                cidade.Uf = cid_uf;

                var bairro = new Domain.Bairro();
                bairro.Nome = bai_desc;

                funcionario.BairroId = GetBairroId(bairro, cidade);
            }

            try
            {
                contexto.Funcionarios.Add(funcionario);
                contexto.SaveChanges();

                Matriculas.Add(matricula);

                count++;
                if (count == 1000)
                {
                    count = 0;
                    contexto.Dispose();
                    contexto = new DataAccess.Context();
                }
            }
            catch (DbEntityValidationException e) { ... }
            catch (Exception e) { ... }
        }
    }
    finally
    {
        file.Close();
    }

    return RedirectToAction("Index");
}

My context is created in the Controller class constructor and destroyed in the Dispose method.
Notice that there is a counter and every thousand records I destroy and re-create my context.

An example for the other methods present:
OBS : all other methods contained in the import code have the same structure as this one.

private int? GetProfissaoId(string descricao)
{
    int? profissao_id = null;
    if (!string.IsNullOrEmpty(descricao))
    {
        var profissaoTemp = CacheProfissoes
            .SingleOrDefault(x => x.Descricao.ToLower().Trim() == descricao.ToLower().Trim());

        if (profissaoTemp == null)
        {
            try
            {
                contexto.Profissoes.Add(profissao);
                contexto.SaveChanges();

                CacheProfissoes.Add(profissao);

                profissao_id = profissao.Id;
            }
            catch (DbEntityValidationException e) { ... }
            catch (Exception e) { ... }

            profissao_id = profissao.Id;
        }
        else
        {
            profissao_id = profissaoTemp.Id;
        }

    }
    return profissao_id;
}
Author: Leonel Sanches da Silva, 2014-06-20

1 answers

This performance problem is well known. By adding the objects one by one, you force the Entity Framework to check the states of all objects attached to the context, which is bad for performance.

There are some ways to solve. I will introduce them and you choose the best one for your case:

1. Turn off AutoDetectChangesEnabled

Simply do the following:

context.Configuration.AutoDetectChangesEnabled = false;

This should end up checking the data between all entities attached to the context.

2. Decrease batch to 100 records

Remember that you are entering not only employees, but cities and neighborhoods, which makes the unit of work having up to 3000 records. Leave the number of records per batch shorter and perform SaveChanges() before doing Dispose(), just to ensure nothing will be lost:

if (count == 100)
{
    count = 0;
    contexto.SaveChanges();
    contexto.Dispose();
    contexto = new DataAccess.Context();
}

3. Use separate Caches; AddRange() with each batch

Create separate lists for your caches, insert into them normally, and at the end of each batch insert all the elements at once:

context.Funcionarios.AddRange(CacheFuncionarios);
context.Cidades.AddRange(CacheCidades);
...
context.SaveChanges();

You can even use all the alternatives presented at the same time, but I recommend implementing one by one and checking the performance. Each case can make the code behave quite differently from one to the other.

See more at http://weblog.west-wind.com/posts/2013/Dec/22/Entity-Framework-and-slow-bulk-INSERTs


Update

An update of this answer is needed because new alternatives have appeared to solve the problem of bulk insert using Entity Framework.

One of them is EntityFramework.BulkInsert (which has a separate package for the version 5 and another for version 4 ). There are other versions for others databases, such as Microsoft SQL Server Compact Edition .

The first response part is still useful for other types of performance improvement, but no longer as useful for the case of bulk insert.

 5
Author: Leonel Sanches da Silva, 2017-03-17 16:10:56