Entity Framework Advanced Tips - part 2

Download Code

Introduction

In this post, I am going to show you how to load related entities in two different ways, and the pros and cons of each one.

When you ask Entity Framework to load an entity like a publisher, it will not return that publisher’s books because EF has no way to know how much information you really need. There could be one million books by that publisher, why would EF load one million rows from database?! This makes sense because you are not likely to be interested in all the books; however, you might be interested in some of them.

Consider the following example that loads 3 publishers from the pubs database:

var pubs = context.publishers.Take(3).ToList();  // Take only 3 publishers

                foreach (var p in pubs)
                {
                    Console.WriteLine("{0} has published the following titles:", p.pub_name );
                    foreach (var t in p.titles)
                    {
                        Console.WriteLine("\t" + t.Title);
                    }
                    
                }

It also queries the titles of each publisher. Note that titles were not loaded with the original query, but instead they were loaded seamlessly by iterating through each publisher. This method is known as lazy loading or deffered loading. Because related entities (ie titles) have been loaded after the original query, extra roundtrips to the database were required to complete the query request. In this example, 3 extra queries were made to the database to return the titles for all the 3 publishers. Imagine you had 100 publishers, it would require 100 roundtrips to get the titles for all these publishers. In that case, lazy loading is not convenient.

Although Entity Framework 4 uses lazy loading by default, it may not be efficient depending on your scenario. For example, if you need all the related data (eg titles) as in the previous case, it makes sense to load them all in only one roundtrip. But if you need only few of these titles, lazy loading can be appropriate. You should always recall that lazy loading requires extra roundtrips to the database as you iterate through your data collection.

Another way to load all titles with the original query is to use the Include method. This method is called eager-loading. Here is an example:

var pubs = context.publishers.Include("titles").Take(3).ToList();  // Take only 3 publishers

                foreach (var p in pubs)
                {
                    Console.WriteLine("{0} has published the following titles:", p.pub_name );
                    foreach (var t in p.titles)
                    {
                        Console.WriteLine("\t" + t.Title);
                    }
                    
                }

The previous code will return all titles in the same query, thus no additional roundtrips are required. In case you wonder about the SQL statement generated by EF:

SELECT 
[Project2].[C1] AS [C1], 
[Project2].[pub_id] AS [pub_id], 
[Project2].[pub_name] AS [pub_name], 
[Project2].[city] AS [city], 
[Project2].[state] AS [state], 
[Project2].[country] AS [country], 
[Project2].[C2] AS [C2], 
[Project2].[title_id] AS [title_id], 
[Project2].[title] AS [title], 
[Project2].[type] AS [type], 
[Project2].[pub_id1] AS [pub_id1], 
[Project2].[price] AS [price], 
[Project2].[advance] AS [advance], 
[Project2].[royalty] AS [royalty], 
[Project2].[ytd_sales] AS [ytd_sales], 
[Project2].[notes] AS [notes], 
[Project2].[pubdate] AS [pubdate]
FROM ( SELECT 
    [Limit1].[pub_id] AS [pub_id], 
    [Limit1].[pub_name] AS [pub_name], 
    [Limit1].[city] AS [city], 
    [Limit1].[state] AS [state], 
    [Limit1].[country] AS [country], 
    [Limit1].[C1] AS [C1], 
    [Extent2].[title_id] AS [title_id], 
    [Extent2].[title] AS [title], 
    [Extent2].[type] AS [type], 
    [Extent2].[pub_id] AS [pub_id1], 
    [Extent2].[price] AS [price], 
    [Extent2].[advance] AS [advance], 
    [Extent2].[royalty] AS [royalty], 
    [Extent2].[ytd_sales] AS [ytd_sales], 
    [Extent2].[notes] AS [notes], 
    [Extent2].[pubdate] AS [pubdate], 
    CASE WHEN ([Extent2].[title_id] IS NULL) THEN CAST(NULL AS int) ELSE 1 END AS [C2]
    FROM   (SELECT TOP (3) 
        [Extent1].[pub_id] AS [pub_id], 
        [Extent1].[pub_name] AS [pub_name], 
        [Extent1].[city] AS [city], 
        [Extent1].[state] AS [state], 
        [Extent1].[country] AS [country], 
        1 AS [C1]
        FROM [dbo].[publishers] AS [Extent1] ) AS [Limit1]
    LEFT OUTER JOIN [dbo].[titles] AS [Extent2] ON [Limit1].[pub_id] = [Extent2].[pub_id]
)  AS [Project2]
ORDER BY [Project2].[pub_id] ASC, [Project2].[C2] ASC

Note that the string parameter of the Include method takes a query path that controls which related entities to return as part of the initial query. You can specify query path like this to return titles and their sales:

var pubs = context.publishers.Include("titles.sales").Take(3).ToList();

You can also eager load employees additionally:

var pubs = context.publishers.Include("titles.sales").Include("employees").Take(3).ToList();

However, you cannot filter the related data this way. For example, in the following code:

var pubs = context.publishers.Include("titles").Take(3).ToList();

You cannot query titles that start with letter ‘T’. This is some limitation of the lazy-loading approach.

Eager loading or Lazy loading?

Now comes the big question. Which method should you use?

It depends on what you really need. Lazy loading postpones data retrieval until data is needed, but requires additional roundtrips to the database. Eager loading fetches all data together in only one roundtrip, but the query command can be complex depending on the model. Eager loading is ideal if all data is really needed.