A LINQ Primer

1. INTRODUCTION

LINQ (Language INtegrated Query) is a set of tools which make it possible to query and manipulate sets of data.  In this article, we are going to examine using LINQ with three kinds of data sources: Object, that is, in-memory data, SQL data, and XML data.  LINQ was introduced in .Net 3.0- and expanded in version 3.5.

LINQ can be used on data which support the IEnumerable <T> interface.  Arrays and collections are two examples.

To get started, let's cite a few examples.

We define an array of integers with the simple C# statement:

        int[ ] fibo = { 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377};

which will be recognized as the beginning of the Fibonacci sequence, F(n) = F(n-1) + F(n-2) where F(0) = 0 and F(1) = 1.

Suppose we want to print the even numbers in this sequence. Using "traditional" procedural C#, we can write

 foreach (int i in fibo)
   if (i % 2 == 0)
     Console.Write (i + "   ");


which produces

            0   2   8   24   144

We could also get the same result  using a simple LINQ query,

   var evens = from evenNo in fibo
             where evenNo % 2 == 0
             select evenNo;
     foreach (int x in evens)
        Console.Write(x + "  ");


Or we could simplify this slightly using a LINQ method

   var evens = fibo.Where (evenNo => evenNo % 2 ==0 ;
     foreach (int x in evens)
        Console.Write (x + "  ");


We shall have more to say about query or method syntax shortly.  But for now, we note that all three code fragments are about equal in length and complexity. 

The power of LINQ becomes apparent as we increase the complexity of our task.   As an example, let's construct a Fibonacci sequence.  We'd like our sequece to be in a List <int>.
collection.  First, we'll use a "traditional" C# source.  We'll restrict our sequence to members with values less than 2000.

   // Construct Fibonacci with traditional procedure
 int max = 2000.
 List<int> fiboCol = new List<int> ();
             
 // Add first two values to collection.
 fiboCol.Add (0);
 fiboCol.Add (1);
 int sum;
 for ( int j = 2; 
       sum = fiboCol[j - 1] + fiboCol [ j - 2 ] ) < max; j++
     )     
  fiboCol.Add(sum);
                                                                   
              
 // List the members of the array...
 foreach (int x in fiboArraya)
    Console.Write (x + "   ");


which produces
     
       0   1   1   2   3   5   8   13   21   34   55   89   144   233   377   610   987   1597

and requires six statements not including the foreach statement which writes the output.

In fact, we can accomplish the same thing using LINQ with only two statements.

We use Func which is defined as a delegate for a method that takes a number of specified input arguments T1, T2, etc., and returns a result, TResult of specified type.  Specifically, Func for three input arguments is defined : 

 public delegate TResult Func<in T1, in T2, in T3, out TResult>
 ( T1 arg1, T2 arg2, T3 arg3)

We'll devise a LINQ method called Fibo that takes as arguments the previous two values of the Fibnonacci sequence and the maximum permissible value of the last member in our
sequence, i.e.,

                                  Func <int, int, int, IEnumerable<int> > Fibo

Func is a delegate for a method often used with LINQ.  For now, don't be concerned if this is new to you.  We'll expand in Details in the next section.  For now, it suffices to understand that after our function has been defined, we can invoke it with Fibo (int x, int y, int max) and it will return a result IEnumerable <int>.

If x and y are the previous two members of a Fibonacci sequence, then we should determine if their sum exceeds max, that is our Lambda expression is

                                 Fibo = (x, y, max)  = (x, y, max) => x + y > max

where x and y are the previous two members of our sequence.  If their sum does not exceed max, then we concatenate the sum to our present set of Fibonacci numbers, that is, we should wind up with the sequence 

                  0,  1,   2,  3,  ..., x + y

where the sequence has N members, and (x + y) is the (N - 1)st member and x and y are the two members immediately preceding the last member.  We should continue adding members while (x + y ) < max.  The following pair of statements will satisy this:


  Func < int, int, int, IEnumerable <int> > Fibo = null;
 Fibo = (x, y, max ) => x + y > max ? 
  Enumerable.Empty <int> : Enumerable.Repeat (x, y, 1)
                                     .Concat(Fibo (x, x + y, max));

      
Two statements replace six statements in the previous procedure.  We invoke our function, which is defined by the above by calling Fibo (0, 1, max) where max = 2000, as in the previous example.  The complete code code fragment is shown in Example 1.

Example 1-1  C# Listing using LINQ to produce Fibonacci sequence of length = max

  int max = 2000;
  Func <int, int, int, IEnumerable<int>> Fibo = null;
  Fibo = (x, y, max) => (x + y) > max ?
                Enumerable.Empty<int> :
                Enumerable.Repeat(x + y, 1).Concat(Fibo(x, x+y, max));
  var outList = Fibo(0, 1, max).ToList ( );
  foreach (int xx in outList)
     Console.Write (xx + "  ");

This will produce the same sequence shown earlier.


Now, let's consider another example, this time using the C# random number generator.  The System.Random class uses a pseudo-random noise (PRN) sequence to generate a random integer uniformly distributed over a specified range.  The generator is started by a "seed" value.  The sequence is not really random.  In fact, it is repeated for the same seed!  (The seed itself can randomized from a property of the Datetime class, the number of 100-msec ticks since Jan 1, 0100 CE.  However, if two intances of the PRN generator are seeded with this value in the same 100-nsec interval, then they will produce the same value.  In our example here, this will not happen.)  What we'd like to see is how uniform the distribution really is.  If we generate N random integers over an interval M in length, then for a uniform distribution, we would expect to see about N / M occurrences of each integer in the interval.

What we'll do is generate a large number of random numbers distributed over a small interval.  Then using LINQ, we'll group these numbers into a separate bin for each integer in the interval.  Then we'll count the number of values in each bin and display the counts which will reveal how flat our distribution really is.

Here's our code.

Example 1-2  C# Listing using LINQ to determine flatness of distribution of random variable

 // Get a seed value.
 int noTicks = (int) DateTime.Now.Ticks;
               
 // Create an instance of the random number generator.
 Random randNo= new Random (noTicks);

 // Take 100 000 samples over an interval from -5 to +5
 // and store these in a collection.
 List<int< colRandNos = new List<int> ( );
 for (i = 0; i < 100000; i++)
 colRandNos.Add (randNo.Next (-5, 5 ));

 // Sort these by value...
 var sortedList = from items in colRandNos
                  orderby items
                  select items;

 // Group these into bins for each integer...
 var grps = sortedList.GroupBy (x => x);

 // ... and display the results.
 foreach (var x in grps)
   Console.WriteLine (x.Key + "   "  + x.ToList().Count());


We should get about 10 000 integers in each bin.  Our run results in the following:

                                    Bin                            No. of integers in bin
                                    -5                                    9983
                                    -4                                    9779
                                    -3                                  10092
                                    -2                                  10075
                                    -1                                  10066
                                     0                                    9839
                                     1                                  10106
                                     2                                  10124
                                     3                                    9984
                                     4                                    9952