Development | Borna Gajić

Efficient Code Optimizations in C#

Friday, Nov 18, 2022 • 14 min read

A deep dive into optimization options using structures, buffers, and other unusual features.

Intro

You are building an application in C#. At some point, it became a blazingly fast, highly profitable piece of engineering. Everything is good and life is great… 💥 until it’s not anymore. CPU spikes to 100% every now and then, process memory is rising for unknown reasons, the application is gradually becoming slower and slower and users are noticing the downgraded performance… You are now presented with 3 options: optimize the code as much as you can, scale up or use what you will learn today.

Let’s talk about C# Structs — the what, where, and why.

Why does the `struct` even exist?

Nowadays struct exist purely as an optimization option. Given the scenario above, the root cause of the performance downgrade is in most cases old unmaintained piece of code, or perhaps bad LINQ, string, or object management. 96% of the time, this will be resolved by refactoring the code and/or scaling up the server (which can be expensive). But there might be a time when you need the other 4% which is what we are going to be talking about in this post. You can read a short blog post by Marc Gravell on the 4% I’m talking about. C# is one of the few languages today that gives you an options besides the usual class. For example languages like JavaScript and Objective-C only have a class and with that do not provide you with an alternative.

What is it?

Once upon a time when you, the reader, learned C#, you might’ve bumped into a section called Value Types. Depending on the source, you might have heard/read that value types are allocated on the stack and Reference Types are allocated on the heap. While the latter is true, the former is as wrong as saying that the Earth is flat. You might be saying now: “But, but, that XY person on YouTube (insert anything really), is saying that value types do in fact get allocated on the stack 100% of the time. And it is definitely not the first time I’ve heard that! 😠”. It is partly true, the only correct definition of these types is that they can be allocated on the stack. And this observable characteristic should not be the fact that differs it from the reference types! Rather they should be differed by design semantics: value types are always copied “by value”. If you still don’t believe me please read the blog post from Microsoft. Consider an example:

using System;

public struct Student
{
    public string Name { get; set; }
    public int Age { get; set;}
}

public class MyClass
{
    public Action<int> Example()
    {
        var myStudent = new Student
        {
            Name = "Jon",
            Age = 29
        };

        return age =>
        {
            myStudent.Age = age;

            Console.WriteLine(myStudent.Name);
        };
    }
}

public static class Program
{
    public static void Main()
    {
        new MyClass().Example()(30);
    }
}

You can also view this code here (recommended). Take a look at the anonymous function returned from the Example method. It’s a piece of code that can be called later while preserving the environment in which it was first created. Note the last part of that sentence. Because we cannot compute the lifetime of the myStudent variable it must be allocated on the heap. To prove that statement we can have a look at the lowered version of MyClass

public class MyClass
{
    [CompilerGenerated]
    private sealed class <>c__DisplayClass0_0
    {
        public Student myStudent;
        internal void <Example>b__0(int age)
        {
            myStudent.Age = age;
            Console.WriteLine(myStudent.Name);
        }
    }

    [System.Runtime.CompilerServices.NullableContext(1)]
    public Action<int> Example()
    {
        <>c__DisplayClass0_0 <>c__DisplayClass0_ = new <>c__DisplayClass0_0();
        Student myStudent = default(Student);
        myStudent.Name = "Jon";
        myStudent.Age = 29;
        <>c__DisplayClass0_.myStudent = myStudent;
        return new Action<int>(<>c__DisplayClass0_.<Example>b__0);
    }
}

Because we have an anonymous function that captures the scope in which it is created, <>c__DisplayClass0_0 is generated by the compiler so that it may hold the contents of the captured scope — in this case, the local variable of myStudent. We know for a fact that classes are being created in the long-term storage which implies that even though we have a structure that is originally contained inside the stack it is moved to the long-term storage due to it being preserved inside the lambda function. Another example of a value type being allocated on the heap is an occurrence called boxing. It just means that a value type will be allocated on the heap if it’s “boxed” to an object or any interface that implements it.

Optimization options

Let’s kick off this part with some motivation, what are the benefits of structures? Link to the code. I’ve also added a sweet FYI in there with comments.

using System;

struct Person
{
    public int Age { get; set; }
    public string Name { get; init; }
}

class Program
{
    static void Main()
    {
        Person firstPerson = new()
        {
            Age = 23,
            Name = "First Person"
        };

        Person secondPerson = new()
        {
            Age = 22,
            Name = "Second Person"
        };

        int result = ComputeAverageAge(firstPerson, secondPerson);

        Console.WriteLine(result);
    }

    static int ComputeAverageAge(Person p1, Person p2) => (p1.Age + p2.Age) / 2;
}

C# JIT can perform heavy optimizations with structures, an example of that would be the above code. When Person is a class the Main method has a number of operations on registers:

Program.Main()
    L0000: push esi
    L0001: mov ecx, 0x1e3acbec
    L0006: call 0x061530c0
    L000b: mov esi, eax
    L000d: mov dword ptr [esi+8], 0x17
    L0014: mov ecx, [0x8ba9f50]
    L001a: lea edx, [esi+4]
    L001d: call 0x71b9c8b0
    L0022: mov ecx, 0x1e3acbec
    L0027: call 0x061530c0
    L002c: mov dword ptr [eax+8], 0x16
    L0033: mov ecx, [0x8ba9f54]
    L0039: lea edx, [eax+4]
    L003c: call 0x71b9c8b0
    L0041: mov ecx, [esi+8]
    L0044: add ecx, [eax+8]
    L0047: mov eax, ecx
    L0049: shr eax, 0x1f
    L004c: add ecx, eax
    L004e: sar ecx, 1
    L0050: call System.Console.WriteLine(Int32)
    L0055: pop esi
    L0056: ret

But once it becomes a struct the Main method is JITed to this:

Program.Main()
    L0000: mov ecx, 0x16
    L0005: call System.Console.WriteLine(Int32)
    L000a: ret

Insane. So what should you take out from this example? JIT can perform really nice optimizations on our code if we are using structures instead of classes.

I previously mentioned a copy by value semantic, let us talk more about this. By default, variable values are copied on assignment, passing an argument to a method, and returning a method result. Although we learned that structures have an optimal size where it’s less expensive to use them instead of using classes, there will be a time when we might need a larger structure as an argument to a method. In these cases, we should always send them by reference. In C++ it is a very common (if not always) way of sending objects as arguments to a method, e.g.

#include <iostream>

class Line
{
    public:
        int length;

        Line(int length) : length{length} {}
};

bool operator==(const Line& lhs, const Line& rhs)
{
    return lhs.length == rhs.length;
}

int main ()
{
    Line line_a {10};
    Line line_b {10};

    if (line_a == line_b)
        std::cout << "Equal!";
    else
        std::cout << "Not equal.";

    return 0;
}

Note that operator== arguments lhs and rhs are passed by reference. We can achieve the same in C# by using in, out and ref keywords. You can view the code here.

using System;

struct Line
{
    public int Length { get; set; }

    public Line(){}

    public bool LineEquals(ref Line other) => Length == other.Length;
}

class Program
{
    static void Main()
    {
        var lineA = new Line()
        {
            Length = 10
        };

        var lineB = new Line()
        {
            Length = 10
        };

        var message = lineA.LineEquals(ref lineB) ? "Equal!" : "Not Equal.";
    }
}

Any change made to other will also be reflected outside. But why are there 3 keywords for the same thing? ref tells the user that the argument might be changed inside the method, out ensures it, and in tells that it will not happen. We can also return by reference and have local variables that reference other variables, the former being ref return, and latter ref local. You will probably be using these if you are not working with ref struct because these are already references by default — we will get to them in a bit.

Now that we know how to handle large structures, there are still some things we should watch out for. For example, this code is perfectly legal:

using System;

// Marking as 'readonly struct' solves the issue
struct Person
{
    public Person(int age) => _age = age;

    private int _age = default;
    public int MutatedAge
    {
        get
        {
           _age = 100;

           return _age;
        }
    }

    public int Age => _age;

    public int Bar(int x) => _age = x;
}

class MyClass
{
    public void Foo()
    {
        var person = new Person(25);

        Console.WriteLine(person.Age);

        Baz(in person);

        Console.WriteLine(person.Age);
    }

    public void Baz(in Person person)
    {
        // We cannot do that:
        // error CS0200: Property or indexer 'Person.Age' cannot be assigned to -- it is read only
        //person.Age = 100;

        // But we can do this... Or can we?
        var age = person.MutatedAge;
    }
}

class Program
{
    static void Main()
    {
        new MyClass().Foo();
    }
}

Link to the code. So, we passed person by reference to Baz and used in keyword to clarify that the person won’t be changing in that method. As expected, while trying to modify the person’s age we are thrown an exception saying we cannot do that. But what if we mutate the backing field inside the getter method of the property? Turns out this also doesn’t work because both console outputs are 25… welcome to C#’s defensive copy mechanismx. Let’s inspect the IL of the Baz method.

IL_0000: ldarg.1          - Load argument 1 onto the stack
IL_0001: ldobj Person     - Copy the value stored to stack
IL_0006: stloc.0          - Pop into local variable 0
IL_0007: ldloca.s 0       - Load address of local variable IL_0009: call instance int32 Person::get_MutatedAge()
...

Defensive copy happens at IL_0001. Because we are sending a non-read only variable to in method C# needs to protect the source variable from any possible scenario in which that variable gets mutated. Accessing public property or calling a public method results in a defensive copy of the caller. How do we mitigate that? Make your structs readonly if they should be immutable and do not call any public methods or properties on the argument such as person. record struct and its readonly counterpart is a type that enforces immutability and should be used more regularly where immutability is wanted. You can read more about defensive copying here.

Now that we know a common pitfall when dealing with value types, we can now turn to the new .NET mainstay: Span<T>. I would highly recommend this blog post. TL;DR Span<T> is a readonly ref struct that enables the representation of contiguous regions of arbitrary memory, regardless of whether that memory is associated with a managed object, is provided by native code via interop, or is on the stack. It cannot escape the scope in which it was created, replaces the raw T[], and because it is bound to the current stack it can only be used in synchronous methods. Memory<T> on the other hand, is a readonly struct that has the same characteristics as Span<T> minus the bounding which allows usage inside asynchronous environments (which means it can escape to the heap).

Span<T> cannot escape from the scope in which it was declared. This is what ref struct is all about. There are a whole set of limitations for them to ensure that it does not happen, you can read more about it here. Once we create a ref struct we get an instance with zero GC attention. Trying this on a project you might be scared away by what needs to be changed in the code to even use this feature, but it really does not. Have a look at the CollectionsMarshal.AsSpan method. There are lots of reading materials for these two additions and if you are really interested in optimizing your code I would recommend spending some time learning about these two.

Consider the following example (link):

using System;
using System.Buffers;

public ref struct MyStruct
{
   private readonly Span<char> _buffer;
   private int _pos;

   public MyStruct()
   {
       _buffer = ArrayPool<char>.Shared.Rent(5);
       _pos = 0;
   }

   // add 'scoped' here
   public void SetBuffer(ReadOnlySpan<char> inputs)
   {
       if (inputs.TryCopyTo(_buffer[_pos..]))
       {
           _pos += inputs.Length;
       }
   }

   public ReadOnlySpan<char> GetBuffer() => _buffer[0.._pos];
}

public class MyClass
{
    public void Foo()
    {
        var myStruct = new MyStruct();

        ReadOnlySpan<char> chars = stackalloc char[5] { 'H', 'e', 'l', 'l', 'o' };

        myStruct.SetBuffer(chars);

        foreach(ref readonly var x in myStruct.GetBuffer())
        {
            Console.Write(x);
        }
    }
}

public static class Program
{
    public static void Main()
    {
        new MyClass().Foo();
    }
}

You might get to a point where you write something like this and get an error saying:

Error CS8352: Cannot use variable ‘chars’ in this context because it may expose referenced variables outside of their declaration scope

Error CS8350: This combination of arguments to ‘MyStruct.SetBuffer(ReadOnlySpan)’ is disallowed because it may expose variables referenced by > parameter ‘inputs’ outside of their declaration scope

Now that’s an interesting message, a careful reader might see a detail that we already discussed previously. Span and any ref struct for that matter cannot (must not) escape their declaration scope. C# compiler cannot be sure if the SetBuffer method won’t do any weird stuff with the inputs argument — mainly sending to another method or some other voodoo that could cause inputs to end up on the heap. To reassure C# that we’ll not be doing that and that we only want to do some operations with stackallock, add scoped keyword besides ReadOnlySpan i.e. public void SetBuffer(scoped ReadonlySpan inputs).

And finally, one last thing to talk about: ValueTask[<T>]!

Ah the beautiful asynchronous programming, lets us have fun and make money 😎. But what happens if the thing that puts food on our table suddenly starts giving us headaches? Consider an example:

using System;
using System.Threading.Tasks;
using System.Linq;

public class MyClass
{
    public async Task<int> CalculateAsync()
    {
        var retResult = 0;

        foreach(var item in Enumerable.Range(1, 10000))
        {
            retResult += await GetNextAsync(item);
        }

        return retResult;
    }

    private async Task<int> GetNextAsync(int token)
    {
        if (token % 2 == 0)
        {
            return await GetInt();
        }

        return token;
    }

    private async Task<int> GetInt()
    {
        await Task.Yield();

        return Random.Shared.Next(1, 100);
    }
}

public static class Program
{
    public static async Task Main()
    {
        var calculatedResult = await new MyClass().CalculateAsync();
    }
}

Here we have a piece of code that performs 10000 additions, half of which are completed synchronously and the other half that are completed asynchronously. In both of these scenarios, C# has to create an instance of a Task which can make a strain on the GC on the code hot paths like this (in a real-world scenario where there would be much more iterations and/or work done in the GetInt method). Obviously, half of the time the GetNextAsync method completes synchronously but even though it does, Task instance is still created for it. To reduce such wasteful behavior ValueTask[<T>] was introduced.

ValueTask<T> is a struct that wraps T, Task or IValueTaskSource. There are a few rules that should be followed if one tries to use a ValueTask[], but the most important one is don’t await the same instance twice. The GetNextAsync can now be changed to:

private async ValueTask<int> GetNextAsync(int token)

So why should you try using this instead of the classic Task<T>? ValueTask<T> gives the best of both worlds, it can significantly lower the memory footprint of a subpart of your application (or whole) if the methods that are likely to complete synchronously are rewritten to return an instance of ValueTask<T>. Here’s a quote from Stephen Toub’s blog post.

This means it [ValueTask] can be returned from an async method, and if that method completes synchronously and successfully, nothing need be allocated: we can simply initialize this ValueTask<TResult> struct with the TResult and return that. Only if the method completes asynchronously does a Task<TResult> need to be allocated, with the ValueTask<TResult> created to wrap that instance…

The above change basically made CalculateAsync allocate less heap memory — let’s see how much. I used the BenchmarkDotNet library to benchmark two variants of the same code. One of which returns a ValueTask<T> and the other a Task<T>. Benchmark code — if you’d like to try it out yourself. Here are the results:

Benchmark result

It’s clear from “1 in 2 async Task vs ValueTask test” that replacing a Task with a ValueTask is a bad idea if you have a method that is only occasionally being completed synchronously. Differences in Allocated, Gen0, and Mean columns are just not good enough to introduce the change even though ValueTask method did finish earlier and allocated less heap memory. Why? Because switching return value type is a braking change and it surely will be more complicated to perform an API change than it is to only gain small benefits from it.

But we also have a result where it is really beneficial to make that change. See “1 in 100 async Task vs ValueTask test”, the difference is huge in all columns. ValueTask completed in less time, and allocated only 24KB of memory (and with that had minimal GC attention)! On the other hand, we have the same test but with a Task: 30x times more heap memory, more GC collections, and as a result, the CPU had less time doing the actual work and more time cleaning up unused resources.

If you’ve come to this part, then you get the general idea of small-scale (or otherwise) optimizations one could make on their code. Let’s briefly summarize the above in the most TL;DR way possible.

Question: How do I optimize my code?

Dante’s code hell

Level 0

Optimize [P]LINQ queries, do not ToList everything, do not perform multiple operations on an un-enumerated IEnumerable, structure your methods and callers so that you utilize deferred execution as much as possible while calling ToList only on common denominators.
Perform heavy string concatenation only with the StringBuilder, string interpolation (which internally uses StringBuilderCache for formatting and Span<char> for the storage — link), and String.Join.
In the rarest of cases use string.Intern to create a global reference to a string of your choosing. Each next access to string with the same characters as the interned one will be a lot faster. Use it only if you are doing a really large number of string operations. The downside of this is that interned strings go straight to the Gen2 heap and with that are practically never cleared.
If you are really desperate at this point and want another option, replace all your foreach for for. Why? foreach is lowered to get the Enumerator of your IEnumerable which introduces more allocations. Example.
Use foreach on a Span<T> that you can get from a List<T> with the help of the CollectionsMarshal.AsSpan method. Use the last thing only if you are reading from a list.
Do stuff more sequentially; as we’ve seen async methods will in the end allocate a significant amount of memory on a code hot path.
Cache stuff in a Dictionary with Linq.ToDictionary.
Use MemoryCache or implement your own IMemoryCache, read more about it here

Level 1

Use struct and record struct (with their respective readonly addons) on code hot paths. Watch out for boxing, do not use string interpolation with struct, do not capture them in a closure, and do not assign them to an interface/object. Do not use mutable values inside an immutable context. If you are using a really small POCO struct do not bother adding ref to it — copy is cheap for it. Do you really need a truck size of a POCO object? Split it into multiple smaller struct‘s and use it that way — JIT will be happy. Or if you cannot avoid large objects, send them by reference and use ref local variables more often. Most important thing: start small and add more struct only if you really need to. Maintaining such code will become a nightmare if too many structs are introduced, this will be more prominent once you start adding ref struct.
Use rentable data structures where you can. Link.

Level 2

se ref struct, stackalloc, [ReadOnly]Span<T> inside LINQ callbacks, or methods in which you are sure that one day the client won’t be changing the logic altogether. The more ref struct you have fewer allocations will happen and with that CPU will have more time to do work.
Perform a benchmark or annalize which methods are in your code hot path and change these to ValueTask<T> if they do finish sequentially most of the time. Ideally, it would be best to switch everything to ValueTask<T> because it essentially gives you no allocation when it can and Task<T> when it cannot. Though that would incur a potential API break.
If you use lots of low-level math on your project consider System.Numerics (specifically Vector<T>)

Easier said than done, though most of the time you won’t be going higher than Level 0.

Hope you learned something from this post today, and that it might help you in your future coding endeavors. Cheers!