EF Core Pt. 3 - Performance

In a series of articles I will talk about Entity Framework Core. In this article I will talk about the EF Core 7 and performance. All articles will be using .net 7 and EF Core 7

Performance and EF has always been up for debate. Some are saying that the performance is soo bad and all database interaction should be done by SQL or stored procedures directly. Some are a little more nuanced and say that performance critical database interactions should be handled outside EF. As always there is a fine line between when to go with what approach, but let us go through what you can do to make the most of EF performance wise.

The EF Core team as done a lot of work on the later versions of EF Core to get better performance out of the framework and also introduced some new features. In EF Core 8 they will go further and allow you to have SQL statements directly returned as your desired objects.

EF Core is an ORM (object-relational mapper) that will let the developer to in C# write type safe code that will be translated into database specific commands, and transform the result into objects. EF Core will also add an extra layer on top of each object to keep track of changes to the object, so it can send these changes back to the database. So each database interaction you as a developer do will always go through this pipeline (or similar)

Transform all incoming (from you) objects into database entities
Generate the database commands according to your request
Send the request to the database
Transform the result back to objects
Add on tracking properties on each object

This means to measure performance all is not the actual SQL query running in the database, there are a lot of other things happening when using EF Core.

But when talking about performance in .Net there one thing to remember that all is not raw performance, time from start to finish, it also includes how much memory it uses. If the memory consumption is high .Net need to garbage collect it and that will halt the application, and it results in bad performance during high load.

To give you all some numbers I will use the great package BenchmarkDotNet and the database from my earlier articles with some seeded data (3000 customers, 16700 orders/invoices and 91600 order/invoice rows). All data has been seeded with the help of the package Bogus that will create realistic data. What BenchmarkDotNet does is to warmup everything and then run a lot of calls for each test and calculate the mean time etc. In my performance test I will randomize the in-parameters (with the same seed) to get a mix of amount of data returned, but always the same for each time I run the performance test.

Tracking

As described above all objects returned will be tracked, so changes can be returned when you are doing a SaveChanges call on the database context.

When you are querying data only and don't require the tracking you can turn it of by using the AsNoTracking command.

// Tracking on (GetAll_AsTracking)
return await dbContext.Customers
.Include(x => x.Invoices)
.ToListAsync();

// Tracking off (GetAll_AsNoTracking)
return await dbContext.Customers
.Include(x => x.Invoices)
.AsNoTracking()
.ToListAsync();

Doing a performance test with the above example it looks like this

As you can see the Mean time much lower and the memory consumption a little lower.

You are able to set the default behavior during the configuration, so no tracking is used and not needed on all queries you make.

Select

When you are querying data the full object will be returned, but in some case you are not interested in getting all properties. Say for example you want a list of customers, you might only want the Id and the customer´s name. In this case a lot of unnecessary data is transferred from the database to your application.

To handle this we are instead of using includes, selecting the properties that we want. When accessing data through navigators on an object EF will automatically be adding on joins, if needed.
Note: Adding a Select statement will automatically turn off tracking, but having it is good practice if the query later on is modified so the select statement is not there.

The approach of using Select means some more code and you need to do the mapping manually, but it might be worth it.

// Use of Includes (GetOne)
var customerEntity = await dbContext.Customers
.AsNoTracking()
.Include(x => x.Address)
.Include(x => x.Orders).ThenInclude(x => x.Rows)
.Where(x => x.DeletedAt == null)
.FirstOrDefaultAsync(x => x.Id == id)
return customerEntity.MapToDetail();

// Use of select (GetOne_WithSelect)
return await dbContext.Customers
.AsNoTracking()
.Where(x => x.DeletedAt == null)
.Select(customer => new CustomerDetailsResponse
{
Id = customer.Id,
Name = customer.Name,
StreetAddress1 = customer.Address.StreetAddress1,
StreetAddress2 = customer.Address.StreetAddress2,
City = customer.Address.City,
PostalCode = customer.Address.PostalCode,
Country = customer.Address.Country,
Phone = customer.Phone,
Fax = customer.Fax,
Description = customer.Description,
HomePage = customer.HomePage,
Orders = customer.Orders.Select(x => new OrderResponse
{
Id = x.Id,
OrderDate = x.OrderDate,
TotalPrice = x.Rows.Sum(r => r.TotalPrice)
}).ToList()
})
.FirstOrDefaultAsync(x => x.Id == id)

And the result is like this

As you can see the time with select is a bit shorter and the memory footprint is smaller.

Worth mention is that when you are doing select as above, you should not add the includes. Doing so will include the joins in full and result in same behavior as the first example (GetOne).

SplitQuery

If you also want to have some information from a foreign table, you will add the Include statement, that will be a SQL join statement, but all properties from that table will be sent back from the database. Adding too many includes will end up with cartesian explosion, which means that data from the base table needs to be duplicated for each include and sub-include and will cost a lot of performance on the database and network.

To handle this you can add the AsSplitQuery() to your query. Doing so EF Core will split all Includes as separate database calls. This can lead to better performance but need to be verified case by case.

SingeOrDefault vs FirstOrDefault

One common mistake is the overuse of SingleOrDefault instead of FirstOrDefault.

In general we filter data on Id or some value that we constructed to be unique in the database, with unique constraint. That means that we know for sure that the value can only be allowed once in the database table.

What EF is doing on SingleOrDefault is to try to take 2 items back, from the database, to verify that only one exists. On a table with a lot of data this might affect the performance because the database can't exit early when the first one is found.

Here is example on how the SQL statements are constructed by EF Core.

// FirstOrDefault
SELECT TOP(1) [c].[Id], [c].[AddressId], ...
FROM [Customers] AS [c]
WHERE ([c].[DeletedAt] IS NULL) AND [c].[Id] = @__id_0

// SingleOrDefault
SELECT TOP(2) [c].[Id], [c].[AddressId], ...
FROM [Customers] AS [c]
WHERE ([c].[DeletedAt] IS NULL) AND [c].[Id] = @__id_0
Include filtering
Say for example that you want a customer with all invoices that passed due date. In many cases you will do a query like this

var customerEntity = await dbContext.Customers
.AsNoTracking()
.Include(x => x.Invoices).ThenInclude(x => x.Rows)
.FirstOrDefaultAsync(x => x.Id == id);

customerEntity.Invoices = customerEntity.Invoices
.Where(x => x.DueDate < DateTime.Now).ToList();

This means that all invoices will be returned back from the database, objects are constructed for no need.
Instead you might want to use include filtering

var customerEntity = await dbContext.Customers
.AsNoTracking()
.Include(x => x.Invoices.Where(o => o.DueDate < DateTime.Now))
.ThenInclude(x => x.Rows)
.FirstOrDefaultAsync(x => x.Id == id);

This will let the database do the filtering and there is no need to return that data from the database. Less data returned, less objects constructed means less memory usage and better performance in speed.

Batch update and delete

In EF Core 7 they included support for batch handling on update and delete.
This means that you don't need to get all data from the database, iterate and do modifications and finally pass all data back to the database.
This action is quite powerful and will be more or less be an update or delete statement sent to the database. One thing to mention is that these calls are not included in the state change management so no need to call SaveChanges are needed but also it will not be included in the transaction that encapsulate all changes that is executed with SaveChanges.

To make a batch update the syntax is like this example

var result = await dbContext.Customers
.Where(x => x.DeletedAt == null && x.Id == id)
.ExecuteUpdateAsync(s => s.SetProperty(x => x.Name, request.Name)
.SetProperty(x => x.Phone, request.Phone)
.SetProperty(x => x.Fax, request.Fax)
.SetProperty(x => x.Description, request.Description)
);

And the delete like this (will purge a soft deleted customer)

var result = await dbContext.Customers
.Where(x => x.Id == id && x.DeletedAt != null)
.ExecuteDeleteAsync();

Compiled Query

One of the features that are rarely used is Compiled Query. A compiled query means that the EF framework will construct the query on compile time and then use it, instead of compiling it the first time it's been used. In some cases this will give performance gain in form of speed, in the cost of memory footprint. The compiled query will take up memory and will not be freed up as normal queries can be.

Making a query compiled requires a little more code.

To convert this example:

public async Task<CustomerDetailsResponse> Handle(int id)
{
var customerEntity = await dbContext.Customers
.AsNoTracking()
.Include(x => x.Address)
.Include(x => x.Orders).ThenInclude(x => x.Rows)
.Where(x => x.DeletedAt == null)
.FirstOrDefaultAsync(x => x.Id == id)
?? throw new ArgumentException($"Customer [{id}] - Not found");

return customerEntity.MapToDetail();
}

To make the query to a compiled version you need to create a static function that will take the database context, in-parameters and return the result. This example is an async version, but the static function doesn't return that. Don't worry the EF Core framework will handle that and convert it to an async version.

public async Task<CustomerDetailsResponse> Handle(int id)
=> (await GetDetailsById(dbContext, id)).MapToDetail()
?? throw new ArgumentException($"Customer [{id}] - Not found");

private static readonly Func<DemoDbContext, int, Task<CustomerEntity?>>
GetDetailsById = Microsoft.EntityFrameworkCore.EF.CompileAsyncQuery(
(DemoDbContext context, int id) => context.Customers
.AsNoTracking()
.Include(x => x.Address)
.Include(x => x.Orders).ThenInclude(x => x.Rows)
.Where(x => x.DeletedAt == null)
.FirstOrDefault(x => x.Id == id)
);

When returning a list of data it will look a little bit different and to simplify things I use the nuget package System.Linq.Async

public async Task<List<CustomerEntity>> Handle(int? take)
{
await Task.CompletedTask;
var all = GetAllCompiled(dbContext, take ?? 10);
return await all.ToListAsync();

}

private static readonly Func<DemoDbContext, int, IAsyncEnumerable<CustomerEntity>>
GetAllCompiled = Microsoft.EntityFrameworkCore.EF.CompileAsyncQuery(
(DemoDbContext context, int take) => context.Customers
.AsNoTracking()
.Include(x => x.Invoices)
.OrderBy(x => x.Id)
.Take(take));

The compiled query will return an IAsyncEnumerable instead of a task. The data need then be converted to a List, and here the .ToListAsync method is used from the System.Linq.Async package.

This method need to be tested case by case because the benefit might not always be enough to make the code more complex. As mentioned earlier the main application´s memory footprint will be larger and that might not be what you want.

Precompiled data model

During the startup of the application and EF, the data model is generated and that might take time depending on how big the model is.

There is a new feature (since EF Core 6) that let you precompile the model and use that one. This will shorten the startup time of your application.

To do that you run the following cli command (when using dotnet EF tools)

dotnet ef dbcontext optimize --output-dir DemoDbModels --namespace DemoDbModels

This will generate the model and store it in you project in a folder named "DemoDbModels".

To use the model add in the options setup in the dependency injection setup, information to use this model instead of building it.

services.AddDbContext<DemoDbContext>(options =>
{
options.UseSqlServer(DummyConfig.ConnectionString, optionsBuilder =>
{
});
options.UseModel(DemoDbModels.DemoDbContextModel.Instance);
});

Not all functionality is supported, for example JsonProperties and QueryFilters added on the entity.

Note: When doing a database change the precompiled model needs to be recreated.

Cancellation Token

When working with the async you should work with the cancellation token. This to handle a way to stop executing if the action is cancelled. In an API controller you are able to add the cancellation token as an incoming parameter, and then pass it on down to the EF Core call.

[Route("")]
public async Task<IActionResult> GetData(CancellationToken cancellationToken)
{
var data = await repo.GetData(cancellationToken);
return Ok(data);
}
In the repository add the cancellation token to the async request

public async Task<List<CustomerEntity>> GetData(CancellationToken cancellationToken)
{
return await dbContext.Customers
.AsNoTracking()
.ToListAsync(cancellationToken);
}

If the request is cancelled, the cancellation token will be marked as cancelled and EF Core will tell the database to stop the execution of the query.
This will help the performance if you are running performance intense queries (like report generation) and the user cancel the request, or if you have "search as you type"-queries, filters in tables or sorting in tables.

Note: Cancellation tokens should not always be passed on to EF Core, like on some add, delete or update - there we need to guarantee that the data is stored or updated and we use the result act on the data during the action. Having the cancellation token on SaveChangesAsync will still guarantee that it is a transaction and it is all or nothing.

Pooled DbContext

The final thing that could give you better performance is using pooled DbContexts. This is not the same thing as pooled database connection in SQL Server, instead this is a pool managed by EF Core. The win here is that each time you create a new DbContext, EF Core need to do some behind the scenes, like setting all up, calling the OnCreating method etc. and this takes time.
To be able to work with pooled DbContext you need to ensure that the DbContext is stateless, EF Core ensures that all tracked entities are cleared out.
The change to use this feature is simple, in your dependency injection setup you change

services.AddDbContext<DemoDbContext>(options =>
{
options.UseSqlServer(DummyConfig.ConnectionString, optionsBuilder =>
{ ... });

});

To this

services.AddDbContextPool<DemoDbContext>(options =>
{
options.UseSqlServer(DummyConfig.ConnectionString, optionsBuilder =>
{ ... });

});

Or through the Factory class

PooledDbContextFactory<DemoDbContext> factory = new PooledDbContextFactory<DemoDbContext>(optionsBuilder.Options);
var dbContext = factory.CreateDbContext();

Summary

Some final example of BenchmarkDotNet

As you can see there are small differences in some cases but using compiled queries with pooled context is a big gain in performance (in this case) and memory usage.

When using EF Core there are several ways to enhance the performance but more ways to destroy it. Sadly there is no silver bullet to get the most performance out, instead you need to verify and try different solutions. If none of them are working you might need to end up with writing custom views, stored procedures or functions in SQL. I have also seen solutions that use another ORM named Dapper for mapping data back to objects for read queries, but doing so might instead give higher maintenance costs and more complex code.

Learn how to use BenchmarkDotNet and write some tests to benchmark, and have a database with a realistic amount of data and soon you will master EF Core performance.

References

To help getting the most out of EF some articles that may be interesting

EF Core 7 preview 6 performance optimizations

EF Core Performance Introduction

The Blog Series About EF Core

EF Core Pt. 1 - Basics

EF Core Pt. 2 - Configuration

EF Core Pt. 4 - EF Core - New in 7 and upcoming 8

Mikael Johannsesson

Mikael Johannesson is a seasoned senior consultant with over 20 years of experience in the IT industry. He currently works at Precio Fishbone, where he is one of the software architects for our digital workplace solution Omnia and also provides expert consultation services to clients across various industries.

Send email