Introduction
The .NET Framework 1.x contains a variety of built-in collection types that can be found in the System.Collections
namespace. These built-in collections include the commonly used ArrayList and Hashtable collections,
along with the CollectionBase class which serves as a great class to extend to build strongly-typed collections.
There are also a variety of other collection classes in this namespace, such as: the Stack and Queue,
which are essentially ArrayLists underneath the covers, but place restrictions on the order with which elements
can be accessed from the collection; there's also a BitArray class for efficiently storing a collection of
Boolean values. (For a thorough discussion of data structures, including a look at the most common collection classes in
the .NET Framework, be sure to read An
Extensive Examination of Data Structures; there's also a
version for .NET 2.0.)
In addition to the classes found directly in the System.Collections namespace, there are a few more classes that
can be found in the System.Collections.Specialized namespace. These classes, as the namespace name implies,
are specialized versions of the classes in the System.Collections namespace. For example, there are a couple
of strongly-typed collections found here - the StringCollection and StringDictionary classes
provide strongly-typed versions of the ArrayList and Hashtable classes, respectively. There's also
the ListDictionary and HybridDictionary classes that can be more efficient classes than the
vanilla Hashtable class.
In this article we'll look at these specialized collection classes and see what features they offer over the base collection
classes. Read on to learn more!
The Strongly-Typed StringCollection and StringDictionary
The ArrayList and Hashtable classes are two of the most used collection classes by developers.
An ArrayList maintains a variable collection of elements that can be indexed ordinally, just like with an array.
(That is, you can do: myArrayList[integerValue] to get a particular value at a particular ordinal
index.) Underneath the covers the ArrayList is implemented using an array of type Object.
The nice thing about an ArrayList is that you don't need to worry about sizing the ArrayList.
With an array, on the other hand, you need to explicitly size the array and, if your storage needs outpace the array's size,
you need to resize the array. With an ArrayList this resizing happens automatically in the Add(item)
method. If you are adding a new element to the ArrayList and there is insufficient space, the ArrayList
automatically increases its underlying array's size, hiding the complexity from you.
Since an ArrayList is implemented using an array of type Object, and since all types in the
.NET Framework are derived, directly or indirectly, from type Object, an ArrayList can hold elements
of any type. Strings, integers, decimals, DataSets, custom class instances... it doesn't matter. Anything can be chucked
into an ArrayList. This has its advantages and disadvantages. It's nice because you aren't limited to storing a
specific type of object in the ArrayList. However, this flexibility comes at a cost in both performance and
code maintainability. By storing everything as an Object, when moving items in and out of the ArrayList
the runtime must determine what type of data really is going in and out. For value types - integers, decimals, Booleans, and so on -
this involves an additional overhead known as boxing. (A discussion of boxing and unboxing is beyond the scope of
this article; refer to An
Extensive Examination of Data Structures for a much more in-depth discussion on this issue.) There's also problems
with code maintainability. By not having strongly-typed objects in your ArrayList you open the door to
hard to catch run-time errors if you accidentally put in an object of the wrong type. With a strongly-typed collection, however,
attempting to put in an invalid type will result in a compile-time error, which means you'll catch it when building your solution
vs. not catching it until actually running the code (which, invariably, you won't test the particular exception-inducing case
until you're giving a live demo to your boss and boss's boss).
The StringCollection in the System.Collections.Specialized provides a strongly-typed string collection.
It behaves just like the ArrayList, the only difference being instead of being able to store items of any type,
you can only store items of type String. Underneath the covers the StringCollection uses an
ArrayList. It then provides the same set of ArrayList methods - Add(), IndexOf(),
Remove(), and so on - but limits their input and return types to strings. Therefore, if you have a case where you
need to store a collection of strings, I'd encourage you to use the StringCollection rather than the type-unsafe
ArrayList.
Just as the StringCollection is a string-based, type-safe version of the ArrayList, the
StringDictionary is a string-based, type-safe version of the Hashtable class. A Hashtable
- sometimes called an associative array - is a mapping between a key and a value. Arrays maintain this mapping using
ordinal keys (0, 1, 2, ...). With a Hashtable, however, the key can be anything - a string, a number, an object reference, and so on.
Like an ArrayList a Hashtable stores values of type Object meaning that you can
add any type to a Hashtable.
The StringDictionary provides restrictions on types that can be inserted into a Hashtable. With
the StringDictionaryboth the keys and values must be of type String.
Since the StringDictionary and StringCollection classes are merely type-safe representations of
the Hashtable and ArrayList classes, their use is never required. That is, there is never a situation
when you could be able to use the StringCollection but could not use an ArrayList instead. However,
if situations to use these strongly-typed variations do present themselves I would encourage you to utilize these specialized
collections.
Moving Forward: Strongly-Typed Collections in .NET 2.0
The .NET Framework version 2.0 introduces the concept of generics. With generics it is very easy to create strongly-typed
collections that can be customized by type at compile-time. For example, with 2.0 you can create a strongly-typed list of
integers using: List<int> myListOfIntegers = new List<int>();
Essentially you can, when writing your code, extend the List class to use the type of your choice. There's no
more need to create specialized classes in order to enjoy strong typing. (For more on generics be sure to read
Introducing .NET Generics.)
More Efficient Dictionaries
A dictionary is an abstract data structure that stores a set of elements referenced by some key. (The Hashtable is
a concrete implementation of a dictionary.) Dictionaries typically offer, at minimum, the following functionality: the ability to
add an item to the dictionary, find an item by key, and remove an item. How these functions are provided depends on the actual concrete
data structure. For example, when adding an element to a Hashtable the Hashtable
maintains an array that actually stores these values. The way the value being added to the Hashtable is mapped
to the array is that the key is converted into an integer value within the bounds of the inner array. The bottom line of this approach is that it's very quick to find, insert,
or retrieve an element from a Hashtable. (The nitty-gritty details on how a Hashtable is implemented,
along with a look at performance, is beyond the scope of this article - you can read about the details in
Part 2 of
An Extensive Examination of Data Structures.)
Another way to implement a dictionary is to use a linked list. (A linked list is a list of items; see
Part 4
of the Extensive Examination of Data Structures article series for a primer on linked lists.) You can think of a linked list
as an ArrayList. To add a new item to a dictionary being implemented using a linked list, simply add the
new element on at the end of the linked list. Finding an element by key in the list, however, takes more work - with a Hashtable
we could quickly find the element since there was a function to map the key to an array index; with the linked list we need to
search through the entire list, one element at a time, until we find the key we're after, and then we can return that associated
key's element value.
Clearly there's a performance difference between a dictionary implemented as a Hashtable and a linked list, with
the Hashtable boasting improved performance in adding, finding, and removing elements. So it would seem to make
sense to use a Hashtable for a dictionary in all circumstances, right? Well, no, actually. While the Hashtable's
performance will outshine the linked list's, it only does so with sufficiently large collections. If you are only
storing, say, five elements in your collection, the overhead that comes with the Hashtable will render it less
performant than the linked list implementation.
For this reason the .NET Framework provides a dictionary implemented using a linked list, the
ListDictionary class in the System.Collections.Specialized namespace. The ListDictionary
is the optimal implementation to use for a dictionary when there are 10 or fewer items. When there are more than 10 items,
it's best to use a Hashtable. So, if you know you are going to have 10 or fewer items in your collection and
you need an associative array (as opposed to an ordinal one), use the ListDictionary class; however, if you anticipate
more than 10 items, use the Hashtable.
Of course many times you do not know how many total elements you'll have in your collection. Thankfully the .NET Framework
includes the HybridDictionary class. The HybridDictionary uses the appropriate implementation
for a dictionary depending on the collection's size. That is, while your collection has 10 or fewer items, the
HybridDictionary internally uses a ListDictionary to maintain the elements. Once you go over
10 elements, however, the HybridDictionary switches over to using a Hashtable. This switch from one
data structure to another is done transparently to you, so you get the best data structure for the size of the collection without
having to know how many items you'll have in advance or having to go through the headache of switching between the two
implementations yourself.
Enumerating the Elements in a HybridDictionary
Like with all collections in the .NET Framework, you can easily enumerate through the elements in a HybridDictionary
using a foreach loop. One thing to realize, though, is that the order of these elements will change when the
HybridDictionary crosses over from the ListDictionary to a Hashtable. The
ListDictionary stores its elements in the order with which they were added. However, a Hashtable's
ordering is based upon how the key is transmogrified into the index used to store the value in the internal array. The short
of it is that a Hashtable will not necessarily enumerate through its elements in the order with which they were
added.
This behavior cannot only cause confusion, but also problems. The Page class uses a HybridDictionary
to store the client-side script that has been programmatically added to a page by a page developer. If you rely on the order with
which you add script blocks, you may be in for a nasty re-ordering surprise if you add more than 10 script blocks. See
Peter Blum's RegisterScripts product for a more thorough discussion
on why this is a problem and a potential solution.
Conclusion
In this article we examined some of the lesser known collections in the .NET Framework, those buried away in the
System.Collections.Specialized namespace. In particular we looked at two strongly-typed collection classes,
StringCollection and StringDictionary, as well as the HybridDictionary, which maintains
a dictionary using the most efficient implementation based on the number of elements in the collection.