Posts Tagged ‘collection’

List of list and collection classes in dotnet

March 14th, 2012

Which List/Collection to use and where?

What do I choose and what do I have to choose from?

I have found no comprehensive, and good, list so here is a stab at it. Much of it thanks to Gary Short and his video.

Update: There is more, and deeper, information by James Micael Hare.

Update: With Dotnet3 came ReadOnlyObservableCollection. I have not looked into this one.

Update: With Dotnet4.5 three new read only lists are found; IReadOnlyList, IReadOnlyDictionary and IReadOnlyCollection. There is also the Immutable namespace that contains the whole set of basic list variants like Dictionary, List, Queue, Set, Stack, Array, Dictionary, HashSet, SortedDictionary and SortedSet.

Update: At the time of writing we are at Dotnet4.7 and even more lists might have been added. If you know of a such, please comment and I will update this list.
Also: Many of the list types I have annotated “Dotnet 4.5”. This does not mean the type ends with Dotnet4.5 but that I have not investigated the type’s existence later than 4.5.

Update: Now we are at Dotnet 4.8 and have been told it is the last one since Dotnet Core and Dotnet Standard has happened; not to mention UWP and Xamarin. Then there is Mono (non Xamarin variant). I have stopped updating versions as they got stale so fast.

Update: It is now Dotnet 7 and I have lost count of all the list-ish types. There is also the parallelised version of list-ish. Nick Chapsas has 13 fast paced minutes about performance on List<>. Tip: the size of the list affects which is fastest and parallel has effect and Span too, albeit with side effect.

List<T>

http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx
Dotnet 2-4.5

( This chapter has the most information and explanations that won’t be repeated in for the lesser used lists. )

This is the one we “normally” use. If you use it as return you should use IList<T> instead as interfaces gives you more decoupling.

Good: Simple to use. No brainer, especially for short lists. Usable with lambda.
Bad: Adding can be expensive. Removing even more so. (I haven’t checked Insert but it should be expensive too.)
There are some caveats with List<T> though.

method: Add(…)

One is that it allocates space in chunks that increas as the list grows bigger. By your first myList.Add(…) dotnet allocates 4 items. When you add a fifth element it doubles the allocation so 4 new are allocated. Now 8 items are allocated, 5 used and 3 wasted. When you add your 9th, 8 new are allocated.

Like this.
4, 8, 16, 32, 64, … you get the idea of doubling.

Now note that the doubling not only is a waste in space but also ticks. Every time the list is extended a call to Array.Copy(…) is made and memory copying is expensive. I haven’t any good figures but check the Gary Short’s cast linked in this article and wind to 15:30.

When one allocates directly in the constructor like this:

var myList = new List<int>(){1,2,3,4,5};

nothing really changes. Behind the scenes 5 myList.Add(…) are called and you still have 8 items allocated.

A way to mitigate this is to tell the compiler how big list you want:

var myList = new List<int>(5);

Now you get 5 items, allocated once. On the other hand it also changes the allocation scheme to multiples of 5

5, 10, 15, 20, …

which might be exactly what you wanted or not.

The way to do this at constructor time is:

var myList = new List<int>(5){1,2,3,4,5};

If you know you are adding a lot of items avoid looping and instead use .AddRange like this:

myList.AddRange( new int[]{6,7,8,9} );

methods: Remove(…) and RemoveAt(…)

Remove does a linear search for your object. Performance is O(n). RemoveAt just goes to the index and removes the item which gives us O(1). Unless you have veeery long lists both methods are considered fast. But. Remember that Remove(…) does a comparision which comes in three flavours IEquatable<T>, Equals and bitwise. Wind to 18:00 in the linked cast.

method: Sort

Sorting is done with Quicksort. Unless you have special cases like an almost ordered list or – even worse – an ordered list, this is a fast sorting algorithm. Performance is O(n log n) in best case and O( n^2) in worst case.

If you have an ordered list, which we often have, you could get some benefit of randomising the list first. Or the obvious: check if the list is ordered before sorting it. Or use another list or collection; which this article is about…

LinkedList<T>

http://msdn.microsoft.com/en-us/library/he2s3bh7.aspx
Dotnet 2-4.5

This is a double linked list.

Good: Fast for Add, Remove and Insert.
Bad: Searching is slower.

Since Add, Remove and Insert only manipulates the item at hand and the items just before and after the performance is O(1).

Searching and going to a certain index means traversing the list and hence performance is O(n).

Dictionary<TKey, TValue>

http://msdn.microsoft.com/en-us/library/xfhwa508.aspx
Dotnet 2-4.7

Good: Fast for searching. Implementing your own key type can be a problem.
Bad: Only one key.

The dictionary is a key-value store. This is considered fast since the comparision between two items depend on a key and not on the whole item.
Here the caveat is on the comparision on the key. The comparision is dependant on the hash key which, if expensive, makes this list less performant.

A simple key like a string (not too long) or a number is fast when the key is evenly distributed. What this means is that lots of keys around 100 and a few around 10000 makes worse performance than if the keys are spread 100, 200, 300, 400 etc. Not a tool good example of me and that is why web searching is your friend.

There is a helper method for quickly creating a dictionary from a GroupBy. See here for example.

Caveat: if you have an object as the key comparision is done through IEquatable.Equals or by overriding Equals and when you do (which you should do once in a while just for the fun of it) you have to override GetHashCode too. And following this you must now grokk both hashing of an object and the hashing theories. Personal recommendation: don’t write the code in a few minutes just because you can; invest a couple of hours in it. It will pay off in the long run.
Also note that hashes are only theoretically unique. This means you can get two hits for the same hash.
Then note that if you write a class where Equals is not overriden dotnet will compare the reference; but if you have an anonymous class dotnet will do a shallow(?) comparison.

Lookup<TKey, TValue>

http://msdn.microsoft.com/en-us/library/bb460184.aspx
Dotnet 3.5-4.6

Good: has several items for each key. Fast searching through the key.
Bad: has several items for each key.

This is a Dictionary<TKey,TValue> variant with the addition that it can take several items for the same key.
Note: For a reason unkown to man the interface is totally different though; it uses an extension method ToLookup which takes  lambda as parameters. I guess there is some good rationale behind this since it the guys behind this are smart.

One creates a Lookup<TKey,TValue> by calling ToLookup on an IEnumerable<T> and the syntax is quite nice.

Note that this list is immutable so nothing can be added to or deleted from the Lookup when it is created.

SortedList<TKey, TValue>

http://msdn.microsoft.com/en-us/library/ms132319.aspx
Dotnet 2-4.5
There is a generic SortedList too.

Good: Sorted. Fast retrieval [O( log n )].

Uses less memory than SortedDictionary<T>.
If you add lots of data at once, use SortedList<T> instead of SortedSet<T> or SortedDictionary<T>.

SortedSet<T>

http://msdn.microsoft.com/en-us/library/dd412070.aspx
Dotnet 4-4.5

Good: Sorted. Fast retrieval [O( log n )].
See more at http://msdn.microsoft.com/en-us/vstudio/ee906600

SortedDictionary<TKey, TValue>

http://msdn.microsoft.com/en-us/library/f7fta44c.aspx
Dotnet 2-4.5
There is a generic version too.

Good: Sorted. Fast retrieval [O( log n )]. Faster insert and remove than SortedList<T>.

ConcurrentBag<T>

http://msdn.microsoft.com/en-us/library/dd381779.aspx
Dotnet4-4.5

Used for concurrency. They don’t solve the problem of concurrency conflicts but mitigates them. Every fetch of add should be tried. The add method is called TryAdd and return true/false depending on if you succeeded or not; the same with TryTake. Yes, this means an if(…) for every such call and corresponding handling. That is one of the reasons concurrency is hard. And fun.

Check out int.TryParse for an easier-to-test method with the same functionality.

ConcurrentDictionary<T>

http://msdn.microsoft.com/en-us/library/dd381779.aspx
Dotnet 4-4.5

Used for concurrency. They don’t solve the problem of concurrency conflicts but mitigates them. Every fetch of add should be tried. The add method is called TryAdd and return true/false depending on if you succeeded or not. Yes, this means an if(…) for every such call and corresponding handling. That is one of the reasons concurrency is hard. And fun.

ConcurrentQueue<T>

http://msdn.microsoft.com/en-us/library/dd267265.aspx
Dotnet 4-4.5

See ConcurrentDictionary.

ConcurrentStack<T>

http://msdn.microsoft.com/en-us/library/dd267331.aspx
Dotnet 4-4.5

See ConcurrentDictionary.

OrderablePartioner<T>

http://msdn.microsoft.com/en-us/library/dd394988.aspx
Dotnet 4-4.5

See ConcurrentDictionary.

Represents a particular manner of splitting an orderable data source into multiple partitions. Not only does it work particular but is also is named totally different. I predict this list will almost never be used since noone finds it.

BlockingCollection<T>

http://msdn.microsoft.com/en-us/library/dd267312.aspx
Dotnet 4-4.5

See ConcurrentDictionary.

Now this collection is slightly different regarding add/remove since it can block and give you true safety. But also deadlocks. If you don’t know what a deadlock is, and don’t fieel like studying this, just go and ask a SQL guy for transactions and deadlocks and he’ll have about the same problem and answers. If you understand what s/he says you’ll understand BlockingCollection.Add too.

One can set a timeout. Ask the SQL guy if you wonder what that sentence meant.

ObservableCollection<T>

http://msdn.microsoft.com/en-us/library/ms668604.aspx
Dotnet 3-4.5

Used typically for MVVM in WPF or Silverlight.

Good: can tell others about its state changes.

If you have a collection and want to watch over it to see if it changes – then this collection is for you. Start a WPF project if you want to play around and test it.

Due to a reason known to only a few this collection is defined in the WindowsBase assembly.

Uses List<T> internally so anything that applies to List<T> applies to ObservableCollection too.

ListDictionary

http://msdn.microsoft.com/en-us/library/system.collections.specialized.listdictionary.aspx
Dotnet 1.1-4.5

A Dictionary implementing IDictionary as a key-value pair linked list. Check Dictionary for what that means.

This list is suitable for lists with fewer than 10 items and even though you probably can retrieve the items in the same order every time, don’t depend on it since Microsoft has specified that the order might change.

It is generic which means type casting is up to you.

OrderedDictionary

http://msdn.microsoft.com/en-us/library/system.collections.specialized.ordereddictionary.aspx
Dotnet 2-4.5

Uses key-value pairs and have them ordered. Duh.

It is generic which means type casting is up to you.

HybridDictionary

http://msdn.microsoft.com/en-us/library/system.collections.specialized.hybriddictionary.aspx
Dotnet 1.1-4.5

Changes to a ListDictionary if the items are few and a Hashtable if they are many. “Few” and “many” has a breakpoint around 10.

One can change if comparision is case sensitive or not.

The key cannot be null.

StringCollection

http://msdn.microsoft.com/en-us/library/system.collections.specialized.stringcollection.aspx
Dotnet 1.1-4.5

Optimised for string lists.

One can store null, allows duplicates, case sensitive comparision; just the same as List<string> but supposedly faster. If someone had some performance figures or reverse engineered code we might know.

StringDictionary

http://msdn.microsoft.com/en-us/library/system.collections.specialized.stringdictionary.aspx
Dotnet 1.1-4.5

Implemented as hash table. Case insensitive comparision. I haven’t checked but I suppose it does ToLower or ToUpper on the keys before storing them. At least that is what I should have done.

StringDictionary is generic and I suppose it exists solely because generics didn’t exist in Dotnet 1.1 and instead of everyone writing their own StringDictionary with their own casts Microsoft built it for everyone (and for themselves of course).

SortedList

http://msdn.microsoft.com/en-us/library/system.collections.sortedlist.aspx
Dotnet 1.1-4.5

Generic list. Do your own type casting.

SortedDictionary

http://msdn.microsoft.com/en-us/library/f7fta44c(v=vs.80).aspx
Dotnet 2-4.5

Generic version. Do your own type casting. And debug your own bugs.

Dictionary

http://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.80).aspx
Dotnet 2-4.5

Is this really correct? Didn’t we have Dictionary in dotnet 1 and 1.1? I vividly remember writing my own version (see StringDictionary) for every type I had.
Which brings me to that it isn’t very difficult inheriting from Dictionary and overriding the right methods. There is (at list was back in last century) cookbooks for this on the web. Today I just use List<T> and am happy with it.

HashSet<T>

http://msdn.microsoft.com/en-us/library/bb353005
Dotnet 3.5-4.5

I have not experimented with this one.

IList<T>

IList is the interface behind List<T>. Use it when you can.
Especially when returning a List, return IList instead to get more decoupling.  Especially especially when writing a framework.

http://stackoverflow.com/questions/400135/c-sharp-listt-or-ilistt

ArrayList

https://docs.microsoft.com/en-us/dotnet/api/system.collections.arraylist

HashTable

https://docs.microsoft.com/en-us/dotnet/api/system.collections.hashtable

IReadOnlyList<T>

https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlylist-1

IReadOnlyDictionary<TKey, TValue>

https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlydictionary-2

IReadOnlyCollection<T>

https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.ireadonlycollection-1

ReadOnlyObservableCollection<T>

https://docs.microsoft.com/en-us/dotnet/api/system.collections.objectmodel.readonlyobservablecollection-1

Queue

https://docs.microsoft.com/en-us/dotnet/api/system.collections.queue

Queue<T>

https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.queue-1

ISet

https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.iset-1

Stack

https://docs.microsoft.com/en-us/dotnet/api/system.collections.stack

Stack<T>

https://docs.microsoft.com/en-us/dotnet/api/system.collections.generic.stack-1

Array

This class does not reside in any collection namespace but is still considered a collection since it implements IList. But at the same time it is fixed in size.

If I understand the dox correctly it is the base class for anything Collection but we, as external developers, should not use it; it is solely for the langauge and the compiler people.

It contains a IsFixedSize property which always is True.

https://docs.microsoft.com/en-us/dotnet/api/system.array

Performance tips

These tips come from Gary Short unless noted.

Boxing/unboxing takes time so if you have value types (int, char etc) (not string that even though it behaves like a value type isn’t one) use generic collections instead.
My comment: casting can fail runtime but un/boxing through, say, List<T> never does so 99 of 100 times I prefer the performance penalty. I also have no figures on the costs. Someone?

All lists, except LinkedList has O(1) for adding an item unless it results in growing when we get O(n). LinkedList never grows in chunks and always have O(1). Se more about growing under List<T>.

List<T> sorts slower when the list is almost sorted and slowest when it is already sorted. This is not schtoopidity but a drawback of the quicksort algorithm.

SortedList<T> uses less memory than SortedDictionary<T> but the latter has faster add/remove.

Also

The cast of Gary Short at Öredev is found here.
More info on collections is found here: