C# LINQ from beginner to expert - Part 1

in #programming7 years ago (edited)

Introduction

There have been many interesting changes to the C# language over the years but the one that left the biggest mark is LINQ. When used correctly it changes how you reason about and build code.

At its heart it is simply a clean, succinct and composable way to navigate, filter and transform lists of data. It can be thought of as SQL like language for working with class structures, XML documents and ironically SQL Server.

There is actually far more to it that lists as I outline in No jargon, No Burritos. C# LINQ = Monads, if you are already comfortable with LINQ that is worth look.

So lets face it, we have all seen and/or written this sort of nested if/loop pyramid of doom in code:

// Build a contact list for all your staff
var contactNumbers = new List<Tuple<string, string>>();
for (var staffMember in office.Staff)
{
    if (staffMember.IsOnContactList)
    {
        for (var phoneNumber in staffMember.PhoneNumbers)
        {
            contactNumbers.Add(
                Tuple.Create(staffMember.Name, phoneNumber));
        }
    }
}

There is a better way, know as LINQ (Language Integrated Query)

Jumping ahead a bit

I am going to jump ahead a fair way at this point to show you the LINQ syntax. This is so you have a feel how they plug together to form more complex operations.

So first off here are the two different syntax styles LINQ has.

// Query syntax
var queryContacts = 
    from staffMember in office.Staff
    where staffMember.IsOnContactList
    from phoneNumber in staffMember.PhoneNumbers
    select Tuple.Create(staffMember.Name, phoneNumber);

// Lambda syntax
var lambdaContacts = office.Staff
    .Where(staffMember => staffMember.IsOnContactList)
    .SelectMany(
        staffMember => staffMember.PhoneNumbers,
        (st, pn) => Tuple.Create(st.Name, pn));

You can probably follow the query syntax, it almost looks like for loops. Don't panic too much on the lambda syntax, it should feel more natural by the end of this series. The query syntax actually gets translated to the lambda syntax by the compiler, they are exactly the same.

Looking at the above it would be a fair assumption that queryContacts and lambdaContacts are some form of collection, not the case. They are actually functions that describe how to create the data when and only when it is needed, an SQL command before it has executed.

This means that if I only look at the first contact in the results ONLY the first contact is processed. No other work is done.

// This is like "SELECT TOP 1 * FROM..."
var firstContact = queryContacts.First();

// This is like "SELECT TOP 10 * FROM..."
var First10 = queryContacts.Take(10);

We will revisit this lazy nature later on, it is what lifts LINQ way above the for loops. But now we need to go back to basics and cover some commands.

Executing a query

You have just learnt that a query is only executed when looked at, so the first thing we need to know is how to execute them. There are a few ways to do this:

  • foreach loop
  • A LINQ command that constructs some type of concrete instance of a collection, such as ToArray(), ToList() or ToDictionary().
  • A LINQ command that sorts or performs aggregation of results such as Sum()

A few quick samples of forced execution before we move on. I will tag each command as Lazy or Eager as we cover it so you know which will force execution when used. This might sound complex but if you think what each command is doing you will see the reason for this.

foreach (var contact in lambdaContacts)
{
    Console.WriteLine(
        $"Name: {contact.Item1}, PhoneNumber: {contact.Item2}");
}

Tuple<string, string>[] contacts = lambdaContacts.ToArray();

Select (Lazy)

Select is a mapping operation. It takes a function and applies it to each element in the collection. The result is an enumerable list of the function results.

// Lambda syntax

// Add one to each element in a list of numbers
var numbers = new[]{1, 2, 3};
var lambdaResults = numbers.Select(number => number + 1);
// Results = 2, 3, 4.

// Get the names of all the people in the list
var lambdaNames = people.Select(person => person.Name)

// Query syntax

var queryResults = from number in numbers select number + 1;
var queryNames = from person in people select person.name;

It should be noted that even if we execute these queries the numbers collection is not changed. The results are a collection of the function results.
You can see from this that it stands to reason that Select() can be lazy, we only need to call the mapper against elements that are looked at.

There are two versions of Select()

public static IEnumerable<TR> Select<T, TR>(
    this IEnumerable<T> source, Func<T, TR> mapper);

public static IEnumerable<TR> Select<T, TR>(
    this IEnumerable<T> source, Func<T, int, TR> mapper);

The first you have already seen in action, in the second the mapper function gets passed the element along with its index in the source. A bit like your classic for loop incrementing an index but without all those chances to make a mistake.

Where (Lazy)

Where is just a simple filter that takes a predicate, a function that for a given value returns true or false to indicate if the element is wanted in the results.

// Lambda syntax

var lambdaNamesFiltered = lambdaNames.Where(name => name == "Bob");

// Query syntax

var queryNamesFiltered = 
    from name in queryNames
    where name == "Bob"
    select name;

You might have noticed I used the queries defined for Select() as the source, this is to show you how you can combine queries together. From this you should be able to guess the signature for Where:

public static IEnumerable<T> Where(
    this IEnumerable<T> source, Func<T, bool> predicate);

You might also be able to figure out how it can be lazy. If I ask for the next element in the filtered collection I only have to evaluate until I find one that passes, only moving onward when another is requested.

You should also start to see how they glue together:

var names = people
    .Select(person => person.Name)
    .Where(name => name == "Bob");

// These next two are actually the same

var names = people
    .Where(person =>
        person.Name == "Bob" && person.Country == "New Zealand")
    .Select(person => person.Name);

var names = people
    .Where(person => person.Name == "Bob")
    .Where(person => person.Country == "New Zealand")
    .Select(person => person.Name);

This feels like a good place to leave part 1. Let me know your thoughts so far or any questions you have, I will answer in the comments.

Woz

Part 2 is now available.

Sort:  

This is a clear explanation with good examples - looking forward to more tips and insights - thanks!

Glad you enjoyed. Shame old articles drop off the radar but looks like you managed to find :)

I just saw part 3 and decided to jump back to the beginning. This is a very interesting intro to an interesting subject matter. You do well with discussing language topics in digestable bites. Looking forward to the other parts!

Thanks for the feedback, always good to hear responses like this :) Keeps me going and putting out content

An upvote on the other parts would be cool :) Any funds I can generate help with the hours put into my content. Most parts are probably 2+ hours of work a piece. But code is what I love hopefully that comes through in my writing

Thanks woz. didn't realize 'foreach' can also excecuting a query. Good to pick it up

I have built a good deep knowledge of using LINQ over the years through trial and some real error :)

Keep with the series, I try and get across as much as I can. Feel free to ask questions when needed :)

Coin Marketplace

STEEM 0.17
TRX 0.15
JST 0.028
BTC 61940.19
ETH 2433.78
USDT 1.00
SBD 2.50