Comparison of 2 linq apps: unexpected result

Question

Comparison of 2 linq apps: unexpected result

I have developed 2 ASP.NET applications using LINQ . One connects to MS SQL Server , the other connects to some structure ... Both applications work with 3 int fields tables with 500,000 records (the memory structure is identical to the SQL Server table). The controls used are regular: GridView and ObjectDataSource . In apps, I calculate the average time it takes to process each clicks of a search call.

LINQ + MS SQL application takes 0.1s to change each page.
LINQ + The memory structure takes 0.8 seconds per page change.

This is a shocking result. Why is an application processing data in memory 8 times slower than an application using a hard drive? Can anyone tell me why this is happening?

+2

sql-server asp.net linq objectdatasource

lukesky May 16 '10 at 11:21

a source to share

2 answers

Marcelo cantos · Answer 1 · 2010-05-16T11:25:38+0000

The primary factor is likely to be algorithmic efficiency. LINQ-to-Objects deals with inputs and outputs IEnumerable<T>

, which are usually processed sequentially, whereas a database can have indexes that cause significant speedups.

Mark byers · Answer 2 · 2010-05-16T11:27:27+0000

I can think of at least three reasons:

indices
caching
special optimizations (e.g. TOP N SORT)

Indexes

There are many types of queries that will execute very quickly if executed on a database that is indexed correctly, but very slow if you iterate over the list in memory. For example, searching by identifier (primary key) is almost instantaneous in the database because the results are stored in the B-tree at a very low height. Searching for the same item in a list in memory would require scanning the entire list.

Caching

It assumes that the database always hits disk. This is not always true. The database will try to store as much data in memory as possible, so when you ask it for data, it already has an answer ready for you. In particular, it will keep commonly used indexes in memory and only hit disk when needed. The way data is stored on disk and in memory has also been carefully optimized to reduce file accesses and page skips.

Optimization

Even without indexes, the database still knows a lot of tricks that can speed up the process. For example, if you do the following in SQL Server:

list.OrderBy(x => x.Value).Take(1)

it will be almost instantaneous if there is an index in the list, but even without an index it will use a special optimization called TOP N SORT

that works in linear time. Check the execution plan for your query to see if this optimization is being used. Note that this optimization is not implemented for LINQ to Objects. We can see this by running this code:

Random random = new Random();
List<Foo> list = new List<Foo>();
for (int i = 0; i < 10000000; ++i)
{
    list.Add(new Foo { Id = random.Next() });
}

DateTime now = DateTime.UtcNow;
Foo smallest = list.OrderBy(foo => foo.Id).First();
Console.WriteLine(DateTime.UtcNow - now);

This code takes about 30 seconds to execute, and the execution time increases more slowly than linearly as more items are added. Replacing the query with this results in it taking less than one second:

int smallestId = list.Min(foo => foo.Id);

This is because LINQ OrderBy

implements an algorithm for objects O(n log(n))

, but Min

uses an algorithm O(n)

. However, when executed against SQL Server, both of these queries will generate the same SQL and both are linear - O(n)

.

Thus, launching a search query such as OrderBy(x => x.Something).Skip(50).Take(10)

is faster in the database because much more effort has been made to make sure it is faster. After all, query speed is the main selling point of databases.

Comparison of 2 linq apps: unexpected result

More articles: