An Extensive Examination of LINQ: The Standard Query Operators
By Scott Mitchell
A Multipart Series on LINQ |
---|
This article is one in a series of articles on LINQ, which was introduced with .NET version 3.5.
|
Introduction
Query operators are methods that work with a sequence of data and perform some task based on the data. They are created as extension methods on the
IEnumerable<T>
interface, which is the interface implemented by classes that hold enumerable data. For example, arrays and the classes in the
System.Collections
and System.Collections.Generic
namespaces all implement IEnumerable<T>
. In
The Ins and Outs of Query Operators we looked at how to create your own query operator that, once created,
can be applied to any enumerable object.
While it is possible to create your own query operators, the good news is that the .NET Framework already ships with a bevy of useful query operators. These query operators are referred to as the standard query operators and are one of the primary pieces of LINQ. The standard query operators include functionality for aggregating sequences of data, concatenating two sequences, converting sequences from one type to another, and splicing out a particular element from the enumeration. There are also standard query operators for generating new sequences, grouping and joining sequences, ordering the elements in sequences, filtering the data in a sequence, and partitioning the sequence.
All together, there are more than 40 standard query operators. This article explores some of the more germane ones, giving examples of the standard query operator in use and examining its underlying source code. There are also several demos included in the download available at the end of the article. Read on to learn more!
Standard Query Operator Overview and Classifications
The standard query operators are a set of query operators that ship with the .NET Framework. Specifically, the standard query operators are defined in the
Enumerable
class, which is found in the
System.Linq
namespace. The standard query operators are extension methods on the
IEnumerable<T>
interface.
Each standard query operator is classified as performing a particular type of operation. In previous installments we looked at the Count
standard query operator, and talked
about the Sum
standard query operator. These two operators are examples of aggregate operators, as they take a sequence of data - a list of integers,
let's say - and aggregate the data, returning some scalar value (the total number of integers or the sum of said integers in the case of Count
and Sum
).
The standard query operators can be classified according to the following types of operations performed:
- Aggregation operators
- Concatenation operators
- Element operators
- Equality operators
- Generation operators
- Grouping operators
- Joining operators
- Ordering operators
- Partitioning operators
- Projection operators
- Quantifiers operators
- Restriction operators
- Set operators
Summing, Averaging, Counting, and Finding Maximum and Minimum Elements
The .NET Framework includes a number of aggregate standard query operators. These operators examine a sequence of data and compute a scalar value. For instance, the
Count
operator, which we've seen in previous installments, returns the total number of elements in the sequence. Other aggregate operators include
Count
, Max
, Min
, and Sum
. A simple example follows, which shows using many of these operators on the Fibonacci
class that we created in the preceding installment.
// C# - Create a Fibonacci object holding the first 10 Fibonacci numbers
|
Keep in mind that the Count
, Average
, Sum
, Min
, and Max
methods used above are not part of the
Fibonacci
class. Rather, they are extension methods on the IEnumerable<T>
interface, which the Fibonacci
class implements.
Furthermore, notice how I used implicit variable typing when reading back the values from these operators (var count = fib.Count()
and
Dim count = fib.Count()
, for example). I could have used explicit typing - int count = fib.Count()
and Dim count As Integer = fib.Count
-
but it's good to get used to implicit typing as this pattern is commonly used with more intricate LINQ queries.
The source code for the aggregation operators are pretty straightforward. For example, the Enumerable
class defines two overloads of the Count
operator. The first works on an object that implements IEnumerable<T>
, and returns an integer value. It's abbreviated code follows. (Note: I've simplified
the method declaration to make it more readable. I used Reflector to view the source code in the .NET Framework.)
// C#
|
In the examples above, the IEnumerable<T>
object named source
that appears to be passed into the method is actually the object the extension
method is being applied to. The Count
method simply enumerates the elements in source, tallies how many iterations it performs, and returns this value. That's it!
The other Count
overload accepts a function as input, which you can use to filter what elements get counted. For example, to instruct the Count
operator
to only count odd numbers you could do something like: var count = fib.Count(n => n % 2 == 1)
or Dim count = fib.Count(Function(n) n Mod 2 = 1)
.
The aggregate operators are examples of greedy query operators. As we discussed in
The Ins and Outs of Query Operators, LINQ operators are either lazy or greedy. A lazy query operator is one
that is not evaluated until the elements of the sequence are enumerated. The sequence can be enumerated either by a foreach
loop or by the application of a greedy
query operator. Point being, when a greedy query operator is applied to a sequence the value computed by the greedy operator is generated immediately. The source code snippet
above shows how the Count
method immediately enumerates its source. This is why it is considered a greedy operator.
The Count
method can work with an enumerable object of any type. Other operators limit the types they can be applied to. For example, the Average
operator can
only be applied to numeric sequences. This restriction is imposed by having a variety of overloads defined in the Enumerable
class for the Average
method.
Rather than having a single method that applies to objects of IEnumerable<T>
, there are overloads for Average
like:
Average(this IEnumerable<int> source)
Average(this IEnumerable<decimal> source)
Average(this IEnumerable<double> source)
- And so on...
Average
operator, but in order to use it you must supply a method that returns the numerical value for the element
that will be used in the average calculation. This overload is useful if you have a collection of objects that contain a numeric value you want to average. For example,
imagine that we have a list of Employee
objects, where each Employee
instance has a Salary
property. The following pseudo code would
compute the average salary:
// C# - compute the average Employee salary
|
The above code assumes that there's some process that returns a populated list of Employee
objects. The average salary is then computed. Because the Employee
object itself cannot be averaged (as it's not a numeric type) we need to pass a method into the Average
operator that provides the value to average for each
Employee
object, in this case the value of each Employee
object's Salary
property. The net result is that we compute the average salary
of all employees in the emps
list.
Conversion Operators
The .NET Framework includes a handful of operators for sequence conversion. The
ToList
and ToArray
operators convert an enumerable object of type
T
into a List<T>
or an array of type T
, respectively. These two methods are most often used to force a lazy query operator to
evaluate. In the previous installment we talked about how a lazy query operator is not evaluated until the source elements are enumerated. To force immediate execution of
the query operators you can use ToList
or ToArray
.
Consider the example from the previous installment. In the code below we have a Fibonacci
object, fib
, that is initialized to having 10 elements.
A query, oddFibs
, is defined that works with the odd numbers. However, before the query is enumerated the Grow
method is called, which doubles the number of
elements in fib
. When oddFibs
is enumerated in the foreach
loop the output contains the odd numbers of the first 20 Fibonacci
numbers, and not the first 10.
// Create a Fibonacci object with the first 10 Fibonacci numbers
|
To force the oddFibs
query to evaluate immediately (rather than waiting for it to be enumerated) you could use the ToList
or ToArray
operator
like so:
// Create a Fibonacci object with the first 10 Fibonacci numbers
|
The foreach
loop in the above code would output the odd numbers in the first 10 Fibonacci numbers because the ToList
call converted the query
into a list of integers, namely a list of integers that compose the odd integers in fib
, of which there are only 10 Fibonacci numbers in it at this time.
Keep in mind that oddFibs
is a different type in both examples. In the first example, oddFibs is of type IEnumerable<int>
. In the second example,
the ToList
operator converts the IEnumerable<int>
sequence returned by the Where
operator into a List<int>
.
Element Operators
The element standard query operators retrieve a particular element from a sequence. The simplest operators in this class are
First
and Last
, which
return the starting and ending elements in the sequence, respectively. The following code snippet uses these two operators to retrieve the smallest and largest values
in the Fibonacci
collection. (Note that First
and Last
do not necessarily return the smallest and largest valued elements in a sequence;
they do so for the Fibonacci
sequence because the Fibonacci numbers are monotonically increasing.)
// C# - Create a Fibonacci object holding the first 10 Fibonacci numbers
|
Use the ElementAt
operator to get the element at a particular location in the enumeration, where the enumeration is indexed starting at zero. The following
snippet verifies that the sum of the third and fourth Fibonacci numbers equals the fifth.
// C# - Ensure that the third and fourth Fibonacci numbers sum up to the fifth number
|
Ordering Operators
The .NET Framework includes query operators for ordering enumerations. The
OrderBy
operator orders an enumeration in ascending order; OrderByDescending
orders an enumeration in descending order. When ordering an enumeration you must provide a method as an input parameter to the operator that specifies the field by which the
elements in the sequence are to be ordered by. For example, if you have a list of Employee
objects and you want to order them by salary in ascending order, you could use
code like the following:
// C# - compute the average Employee salary
|
The method passed into the OrderBy
operator indicates that each Employee
object should be ordered by the Salary
property.
If you are ordering a sequence of primitive types that do not have any properties (such as ordering a list of integers or an array of string) you still need to pass in a
method indicating the value to order on, but the format would look like x => x
or Function(x) x
. For example, to order a Fibonacci
object
in descending order you'd do:
// C# - Create a Fibonacci object holding the first 10 Fibonacci numbers
|
The ordering operators include an overload where you can pass in a comparer method that given two elements in the sequence specifies how the two relate - if they are equal or not, and if not then what element comes before the other. If provided, this method is used by the ordering operators. You must provide such a method if the field you are ordering by does not have a built-in comparer. (Types like integers, strings, and dates already have comparers defined in the .NET Framework.)
Partitioning Operators
Previous installments looked at the
Where
operator, which enables a developer to specify a condition and filter out all elements from a sequence that do not
meet that condition. We'll look at the Where
operator momentarily, but before we do let's first focus on the partitioning operators. The partitioning operators
divide the sequence into two partitions with a "left partition" and a "right partition." The two simplest partitioning operators are Skip
and Take
, which
skip over the first n elements or take the first n elements. The following code snippet shows how to use Skip
to skip over the first three Fibonacci numbers.
// C# - Create a Fibonacci object holding the first 10 Fibonacci numbers
|
The fibWithFirstThreeRemoved
enumeration (currently) contains the 4th, 5th, 6th, 7th, 8th, 9th, and 10th Fibonacci numbers.
The SkipWhile
and TakeWhile
operators partition the sequence until some condition is true. We could replace the above Skip(3)
operator
with the SkipWhile
operator like so:
// C# - Skip over the Fibonacci numbers while the current number is less than or equal to 2
|
Keep in mind that the n
in the lambda expression is the current Fibonacci number being evaluated and does not have any bearing on the index of the element in the sequence.
The first four Fibonacci numbers are 1, 1, 2, and 3. The SkipWhile
operator evaluates each element from the beginning and skips over it if
the method evaluates to True. Therefore, it skips over the first three elements - 1, 1, and 2 - but not the third - 3 - because the first three are less than or equal to 2,
but the third one is not.
Restriction (Filtering) Operators
The standard query operators include a single restriction (or filtering) operator:
Where
. The Where
operator accepts a method as its input that specifies
the condition for inclusion. When enumerated, the operator applies the condition to each element in its source; if the condition holds, the element is included in the resultset,
otherwise it is filtered out.
The following snippet starts by getting the list of files in the current folder. It then uses the Where operator along with the Sum and Average operators to glean information
about the amount of space taken up by the files and by certain types of files. (For more information on how to programmatically work with the file system from an ASP.NET page,
be sure to consult the System.IO
namespace FAQs over on
ASPFAQs.com.)
// C# - Get the list of files in the current folder
|
The above code starts be retrieving information about all of the files in the folder that the currently executing ASP.NET page resides in. It then uses the Count
and Sum
operators to get the number of files and the total file size. Note that the Sum
method includes a selector method. The elements of the
fInfo
sequence are FileInfo
objects. One of the properties of the FileInfo
object is Length
, which returns the size of the file
in bytes. Therefore, we call the Sum
operator and supply a method that returns the field to sum, namely Length
.
Next, the Where
operator is used to get only those files that have the extension ".aspx". The Count
and Sum
operators are applied to
this query to get the count and total file size of the .aspx
pages in the folder.
Conclusion
LINQ includes a host of standard query operators, which are built-in operators that perform some calculation or modification to a sequence. The standard query operators can be broken down into various types, such as aggregation, conversion, element, grouping, joining, projection, and restriction types, among others. This article looked at a variety of standard query operators and showed them in action. The download available at the end of this article includes a handful of demos.
The standard query operator examples in this article (and in the download) use the extension method syntax, such as: SequenceObject.Operator, or
fib.Count()
. An Introduction to LINQ noted that LINQ has a unique query syntax that allows
you to use query operators in a SQL-like syntax. The next installment will explore LINQ's query syntax, which is what enables developers to write SQL-like queries
in C# and Visual Basic syntax.
Happy Programming!
Attachments:
Further Reading
Enumerable
Class (technical docs)
A Multipart Series on LINQ |
---|
This article is one in a series of articles on LINQ, which was introduced with .NET version 3.5.
|