Is it better to call ToList() or ToArray() in LINQ queries?

0 votes
asked Jul 9, 2009 by frank-krueger

I often run into the case where I want to eval a query right where I declare it. This is usually because I need to iterate over it multiple times and it is expensive to compute. For example:

string raw = "...";
var lines = (from l in raw.Split('\n')
             let ll = l.Trim()
             where !string.IsNullOrEmpty(ll)
             select ll).ToList();

This works fine. But if I am not going to modify the result, then I might as well call ToArray() instead of ToList().

I wonder however whether ToArray() is implemented by first calling ToList() and is therefore less memory efficient than just calling ToList().

Am I crazy? Should I just call ToArray() - safe and secure in the knowledge that the memory won't be allocated twice?

14 Answers

0 votes
answered Jul 9, 2009 by mquander

The performance difference will be insignificant, since List<T> is implemented as a dynamically sized array. Calling either ToArray() (which uses an internal Buffer<T> class to grow the array) or ToList() (which calls the List<T>(IEnumerable<T>) constructor) will end up being a matter of putting them into an array and growing the array until it fits them all.

If you desire concrete confirmation of this fact, check out the implementation of the methods in question in Reflector -- you'll see they boil down to almost identical code.

0 votes
answered Jul 9, 2009 by guffa

The memory will always be allocated twice - or something close to that. As you can not resize an array, both methods will use some sort of mechanism to gather the data in a growing collection. (Well, the List is a growing collection in itself.)

The List uses an array as internal storage, and doubles the capacity when needed. This means that by average 2/3 of the items has been reallocated at least once, half of those reallocated at least twice, half of those at least thrice, and so on. That means that each item has by average been reallocated 1.3 times, which is not very much overhead.

Remember also that if you are colleting strings, the collection itself only contains the references to the strings, the strings themselves aren't reallocated.

0 votes
answered Jul 1, 2010 by vitaliy-ulantikov

ToList() is usually preferred if you use it on IEnumerable<T> (from ORM, for instance). If the length of sequence is not known at the beginning, ToArray() creates dynamic-length collection like List and then converts it to array, which takes extra time.

0 votes
answered Jul 12, 2010 by scott-rippey

Edit: The last part of this answer is not valid. However, the rest is still useful information, so I'll leave it.

I know this is an old post, but after having the same question and doing some research, I have found something interesting that might be worth sharing.

First, I agree with @mquander and his answer. He is correct in saying that performance-wise, the two are identical.

However, I have been using Reflector to take a look at the methods in the System.Linq.Enumerable extensions namespace, and I have noticed a very common optimization.
Whenever possible, the IEnumerable<T> source is cast to IList<T> or ICollection<T> to optimize the method. For example, look at ElementAt(int).

Interestingly, Microsoft chose to only optimize for IList<T>, but not IList. It looks like Microsoft prefers to use the IList<T> interface.

System.Array only implements IList, so it will not benefit from any of these extension optimizations.
Therefore, I submit that the best practice is to use the .ToList() method.
If you use any of the extension methods, or pass the list to another method, there is a chance that it might be optimized for an IList<T>.

0 votes
answered Jul 11, 2011 by emp

I agree with @mquander that the performance difference should be insignificant. However, I wanted to benchmark it to be sure, so I did - and it is, insignificant.

Testing with List<T> source:
ToArray time: 1934 ms (0.01934 ms/call), memory used: 4021 bytes/array
ToList  time: 1902 ms (0.01902 ms/call), memory used: 4045 bytes/List

Testing with array source:
ToArray time: 1957 ms (0.01957 ms/call), memory used: 4021 bytes/array
ToList  time: 2022 ms (0.02022 ms/call), memory used: 4045 bytes/List

Each source array/List had 1000 elements. So you can see that both time and memory differences are negligible.

My conclusion: you might as well use ToList(), since a List<T> provides more functionality than an array, unless a few bytes of memory really matter to you.

0 votes
answered Jul 14, 2011 by frep-d-oronge

This is an old question - but for the benefit of users who stumble upon it, there is also and alternative of 'Memoizing' the Enumerable - which has the effect of caching and stopping multiple enumeration of a Linq statement, which is what ToArray() and ToList() are used for a lot, even though the collection attributes of the list or array are never used.

Memoize is available in the RX/System.Interactive lib, and is explained here: More LINQ with System.Interactive

(From Bart De'Smet's blog which is a highly recommended read if you are working with Linq to Objects a lot)

0 votes
answered Jan 28, 2012 by weston

One option is to add your own extension method that returns a readonly ICollection<T>. This can be better than using ToList or ToArray when you do not want to use either the indexing properties of an array/list, or add/remove from a list.

public static class EnumerableExtension
{
    /// <summary>
    /// Causes immediate evaluation of the linq but only if required.
    /// As it returns a readonly ICollection, is better than using ToList or ToArray
    /// when you do not want to use the indexing properties of an IList, or add to the collection.
    /// </summary>
    /// <typeparam name="T"></typeparam>
    /// <param name="enumerable"></param>
    /// <returns>Readonly collection</returns>
    public static ICollection<T> Evaluate<T>(this IEnumerable<T> enumerable)
    {
        //if it's already a readonly collection, use it
        var collection = enumerable as ICollection<T>;
        if ((collection != null) && collection.IsReadOnly)
        {
            return collection;
        }
        //or make a new collection
        return enumerable.ToList().AsReadOnly();
    }
}

Unit tests:

[TestClass]
public sealed class EvaluateLinqTests
{
    [TestMethod]
    public void EvalTest()
    {
        var list = new List<int> {1, 2, 3};
        var linqResult = list.Select(i => i);
        var linqResultEvaluated = list.Select(i => i).Evaluate();
        list.Clear();
        Assert.AreEqual(0, linqResult.Count());
        //even though we have cleared the underlying list, the evaluated list does not change
        Assert.AreEqual(3, linqResultEvaluated.Count());
    }

    [TestMethod]
    public void DoesNotSaveCreatingListWhenHasListTest()
    {
        var list = new List<int> {1, 2, 3};
        var linqResultEvaluated = list.Evaluate();
        //list is not readonly, so we expect a new list
        Assert.AreNotSame(list, linqResultEvaluated);
    }

    [TestMethod]
    public void SavesCreatingListWhenHasReadonlyListTest()
    {
        var list = new List<int> {1, 2, 3}.AsReadOnly();
        var linqResultEvaluated = list.Evaluate();
        //list is readonly, so we don't expect a new list
        Assert.AreSame(list, linqResultEvaluated);
    }

    [TestMethod]
    public void SavesCreatingListWhenHasArrayTest()
    {
        var list = new[] {1, 2, 3};
        var linqResultEvaluated = list.Evaluate();
        //arrays are readonly (wrt ICollection<T> interface), so we don't expect a new object
        Assert.AreSame(list, linqResultEvaluated);
    }

    [TestMethod]
    [ExpectedException(typeof (NotSupportedException))]
    public void CantAddToResultTest()
    {
        var list = new List<int> {1, 2, 3};
        var linqResultEvaluated = list.Evaluate();
        Assert.AreNotSame(list, linqResultEvaluated);
        linqResultEvaluated.Add(4);
    }

    [TestMethod]
    [ExpectedException(typeof (NotSupportedException))]
    public void CantRemoveFromResultTest()
    {
        var list = new List<int> {1, 2, 3};
        var linqResultEvaluated = list.Evaluate();
        Assert.AreNotSame(list, linqResultEvaluated);
        linqResultEvaluated.Remove(1);
    }
}
0 votes
answered Jul 7, 2012 by nawfal

You should base your decision to go for ToList or ToArray based on what ideally the design choice is. If you want a collection that can only be iterated and accessed by index, choose ToArray. If you want additional capabilities of adding and removing from the collection later on without much hassle, then do a ToList (not really that you cant add to an array, but that's not the right tool for it usually).

If performance matters, you should also consider what would be faster to operate on. Realistically, you wont call ToList or ToArray a million times, but might work on the obtained collection a million times. In that respect [] is better, since List<> is [] with some overhead. See this thread for some efficiency comparison: Which one is more efficient : List<int> or int[]

In my own tests a while ago, I had found ToArray faster. And I'm not sure how skewed the tests were. The performance difference is so insignificant though, which can noticeable only if you are running these queries in a loop millions of times.

0 votes
answered Jan 14, 2013 by gary

For anyone interested in using this result in another Linq-to-sql such as

from q in context.MyTable
where myListOrArray.Contains(q.someID)
select q;

then the SQL that is generated is the same whether you used a List or Array for the myListOrArray. Now I know some may ask why even enumerate before this statement, but there is a difference between the SQL generated from an IQueryable vs (List or Array).

0 votes
answered Jul 1, 2013 by jaredpar

Unless you simply need an array to meet other constraints you should use ToList. In the majority of scenarios ToArray will allocate more memory than ToList.

Both use arrays for storage, but ToList has a more flexible constraint. It needs the array to be at least as large as the number of elements in the collection. If the array is larger, that is not a problem. However ToArray needs the array to be sized exactly to the number of elements.

To meet this constraint ToArray often does one more allocation than ToList. Once it has an array that is big enough it allocates an array which is exactly the correct size and copies the elements back into that array. The only time it can avoid this is when the grow algorithm for the array just happens to coincide with the number of elements needing to be stored (definitely in the minority).

EDIT

A couple of people have asked me about the consequence of having the extra unused memory in the List<T> value.

This is a valid concern. If the created collection is long lived, is never modified after being created and has a high chance of landing in the Gen2 heap then you may be better off taking the extra allocation of ToArray up front.

In general though I find this to be the rarer case. It's much more common to see a lot of ToArray calls which are immediately passed to other short lived uses of memory in which case ToList is demonstrably better.

The key here is to profile, profile and then profile some more.

Welcome to Q&A, where you can ask questions and receive answers from other members of the community.
Website Online Counter

...