Friday, Sep 16, 2022
Efficient Code Optimizations in C#
Intro
You are building an application in C#. At some point, it became a blazingly fast, highly profitable piece of engineering. Everything is good and life is great… 💥 until it’s not anymore. CPU spikes to 100% every now and then, process memory is rising for unknown reasons, the application is gradually becoming slower and slower and users are noticing the downgraded performance… You are now presented with 3 options: optimize the code as much as you can, scale up or use what you will learn today.
Let’s talk about C# Structs — the what, where, and why.
Why does the struct
even exist?
Nowadays struct
exist purely as an optimization option. Given the scenario above, the root cause of the performance downgrade is in most cases old unmaintained piece of code, or perhaps bad LINQ, string, or object management. 96% of the time, this will be resolved by refactoring the code and/or scaling up the server (which can be expensive). But there might be a time when you need the other 4% which is what we are going to be talking about in this post. You can read a short blog post by Marc Gravell on the 4% I’m talking about. C# is one of the few languages today that gives you an options besides the usual class
. For example languages like JavaScript and Objective-C only have a class
and with that do not provide you with an alternative.
What is it?
Once upon a time when you, the reader, learned C#, you might’ve bumped into a section called Value Types. Depending on the source, you might have heard/read that value types are allocated on the stack and Reference Types are allocated on the heap. While the latter is true, the former is as wrong as saying that the Earth is flat. You might be saying now: “But, but, that XY person on YouTube (insert anything really), is saying that value types do in fact get allocated on the stack 100% of the time. And it is definitely not the first time I’ve heard that! 😠”. It is partly true, the only correct definition of these types is that they can be allocated on the stack. And this observable characteristic should not be the fact that differs it from the reference types! Rather they should be differed by design semantics: value types are always copied “by value”. If you still don’t believe me please read the blog post from Microsoft. Consider an example:
using System;
public struct Student
{
public string Name { get; set; }
public int Age { get; set;}
}
public class MyClass
{
public Action<int> Example()
{
var myStudent = new Student
{
Name = "Jon",
Age = 29
};
return age =>
{
myStudent.Age = age;
Console.WriteLine(myStudent.Name);
};
}
}
public static class Program
{
public static void Main()
{
new MyClass().Example()(30);
}
}
You can also view this code here (recommended). Take a look at the anonymous function returned from the Example
method. It’s a piece of code that can be called later while preserving the environment in which it was first created. Note the last part of that sentence. Because we cannot compute the lifetime of the myStudent
variable it must be allocated on the heap. To prove that statement we can have a look at the lowered version of MyClass
public class MyClass
{
[CompilerGenerated]
private sealed class <>c__DisplayClass0_0
{
public Student myStudent;
internal void <Example>b__0(int age)
{
myStudent.Age = age;
Console.WriteLine(myStudent.Name);
}
}
[System.Runtime.CompilerServices.NullableContext(1)]
public Action<int> Example()
{
<>c__DisplayClass0_0 <>c__DisplayClass0_ = new <>c__DisplayClass0_0();
Student myStudent = default(Student);
myStudent.Name = "Jon";
myStudent.Age = 29;
<>c__DisplayClass0_.myStudent = myStudent;
return new Action<int>(<>c__DisplayClass0_.<Example>b__0);
}
}
Because we have an anonymous function that captures the scope in which it is created, <>c__DisplayClass0_0
is generated by the compiler so that it may hold the contents of the captured scope — in this case, the local variable of myStudent
. We know for a fact that classes are being created in the long-term storage which implies that even though we have a structure that is originally contained inside the stack it is moved to the long-term storage due to it being preserved inside the lambda function. Another example of a value type being allocated on the heap is an occurrence called boxing. It just means that a value type will be allocated on the heap if it’s “boxed” to an object
or any interface that implements it.
Optimization options
Let’s kick off this part with some motivation, what are the benefits of structures? Link to the code. I’ve also added a sweet FYI in there with comments.
using System;
struct Person
{
public int Age { get; set; }
public string Name { get; init; }
}
class Program
{
static void Main()
{
Person firstPerson = new()
{
Age = 23,
Name = "First Person"
};
Person secondPerson = new()
{
Age = 22,
Name = "Second Person"
};
int result = ComputeAverageAge(firstPerson, secondPerson);
Console.WriteLine(result);
}
static int ComputeAverageAge(Person p1, Person p2) => (p1.Age + p2.Age) / 2;
}
C# JIT can perform heavy optimizations with structures, an example of that would be the above code. When Person
is a class the Main
method has a number of operations on registers:
Program.Main()
L0000: push esi
L0001: mov ecx, 0x1e3acbec
L0006: call 0x061530c0
L000b: mov esi, eax
L000d: mov dword ptr [esi+8], 0x17
L0014: mov ecx, [0x8ba9f50]
L001a: lea edx, [esi+4]
L001d: call 0x71b9c8b0
L0022: mov ecx, 0x1e3acbec
L0027: call 0x061530c0
L002c: mov dword ptr [eax+8], 0x16
L0033: mov ecx, [0x8ba9f54]
L0039: lea edx, [eax+4]
L003c: call 0x71b9c8b0
L0041: mov ecx, [esi+8]
L0044: add ecx, [eax+8]
L0047: mov eax, ecx
L0049: shr eax, 0x1f
L004c: add ecx, eax
L004e: sar ecx, 1
L0050: call System.Console.WriteLine(Int32)
L0055: pop esi
L0056: ret
But once it becomes a struct
the Main
method is JITed to this:
Program.Main()
L0000: mov ecx, 0x16
L0005: call System.Console.WriteLine(Int32)
L000a: ret
Insane. So what should you take out from this example? JIT can perform really nice optimizations on our code if we are using structures instead of classes.
I previously mentioned a copy by value semantic, let us talk more about this. By default, variable values are copied on assignment, passing an argument to a method, and returning a method result. Although we learned that structures have an optimal size where it’s less expensive to use them instead of using classes, there will be a time when we might need a larger structure as an argument to a method. In these cases, we should always send them by reference. In C++ it is a very common (if not always) way of sending objects as arguments to a method, e.g.
#include <iostream>
class Line
{
public:
int length;
Line(int length) : length{length} {}
};
bool operator==(const Line& lhs, const Line& rhs)
{
return lhs.length == rhs.length;
}
int main ()
{
Line line_a {10};
Line line_b {10};
if (line_a == line_b)
std::cout << "Equal!";
else
std::cout << "Not equal.";
return 0;
}
Note that operator==
arguments lhs
and rhs
are passed by reference. We can achieve the same in C# by using in
, out
and ref
keywords. You can view the code here.
using System;
struct Line
{
public int Length { get; set; }
public Line(){}
public bool LineEquals(ref Line other) => Length == other.Length;
}
class Program
{
static void Main()
{
var lineA = new Line()
{
Length = 10
};
var lineB = new Line()
{
Length = 10
};
var message = lineA.LineEquals(ref lineB) ? "Equal!" : "Not Equal.";
}
}
Any change made to other
will also be reflected outside. But why are there 3 keywords for the same thing? ref
tells the user that the argument might be changed inside the method, out
ensures it, and in
tells that it will not happen. We can also return by reference and have local variables that reference other variables, the former being ref return
, and latter ref local
. You will probably be using these if you are not working with ref struct
because these are already references by default — we will get to them in a bit.
Now that we know how to handle large structures, there are still some things we should watch out for. For example, this code is perfectly legal:
using System;
// Marking as 'readonly struct' solves the issue
struct Person
{
public Person(int age) => _age = age;
private int _age = default;
public int MutatedAge
{
get
{
_age = 100;
return _age;
}
}
public int Age => _age;
public int Bar(int x) => _age = x;
}
class MyClass
{
public void Foo()
{
var person = new Person(25);
Console.WriteLine(person.Age);
Baz(in person);
Console.WriteLine(person.Age);
}
public void Baz(in Person person)
{
// We cannot do that:
// error CS0200: Property or indexer 'Person.Age' cannot be assigned to -- it is read only
//person.Age = 100;
// But we can do this... Or can we?
var age = person.MutatedAge;
}
}
class Program
{
static void Main()
{
new MyClass().Foo();
}
}
Link to the code. So, we passed
person by reference to Baz
and used in
keyword to clarify that the person
won’t be changing in that method. As expected, while trying to modify the person’s age we are thrown an exception saying we cannot do that. But what if we mutate the backing field inside the getter method of the property? Turns out this also doesn’t work because both console outputs are 25… welcome to C#’s defensive copy mechanismx. Let’s inspect the IL of the Baz
method.
IL_0000: ldarg.1 - Load argument 1 onto the stack
IL_0001: ldobj Person - Copy the value stored to stack
IL_0006: stloc.0 - Pop into local variable 0
IL_0007: ldloca.s 0 - Load address of local variable IL_0009: call instance int32 Person::get_MutatedAge()
...
Defensive copy happens at IL_0001
. Because we are sending a non-read only variable to in
method C# needs to protect the source variable from any possible scenario in which that variable gets mutated. Accessing public property or calling a public method results in a defensive copy of the caller. How do we mitigate that? Make your structs readonly
if they should be immutable and do not call any public methods or properties on the argument such as person
. record struct
and its readonly
counterpart is a type that enforces immutability and should be used more regularly where immutability is wanted. You can read more about defensive copying here.
Now that we know a common pitfall when dealing with value types, we can now turn to the new .NET mainstay: Span<T>
. I would highly recommend this blog post. TL;DR Span<T>
is a readonly ref struct
that enables the representation of contiguous regions of arbitrary memory, regardless of whether that memory is associated with a managed object, is provided by native code via interop, or is on the stack. It cannot escape the scope in which it was created, replaces the raw T[]
, and because it is bound to the current stack it can only be used in synchronous methods. Memory<T>
on the other hand, is a readonly struct
that has the same characteristics as Span<T>
minus the bounding which allows usage inside asynchronous environments (which means it can escape to the heap).
Span<T>
cannot escape from the scope in which it was declared. This is what ref struct
is all about. There are a whole set of limitations for them to ensure that it does not happen, you can read more about it here. Once we create a ref struct
we get an instance with zero GC attention. Trying this on a project you might be scared away by what needs to be changed in the code to even use this feature, but it really does not. Have a look at the CollectionsMarshal.AsSpan
method. There are lots of reading materials for these two additions and if you are really interested in optimizing your code I would recommend spending some time learning about these two.
Consider the following example (link):
using System;
using System.Buffers;
public ref struct MyStruct
{
private readonly Span<char> _buffer;
private int _pos;
public MyStruct()
{
_buffer = ArrayPool<char>.Shared.Rent(5);
_pos = 0;
}
// add 'scoped' here
public void SetBuffer(ReadOnlySpan<char> inputs)
{
if (inputs.TryCopyTo(_buffer[_pos..]))
{
_pos += inputs.Length;
}
}
public ReadOnlySpan<char> GetBuffer() => _buffer[0.._pos];
}
public class MyClass
{
public void Foo()
{
var myStruct = new MyStruct();
ReadOnlySpan<char> chars = stackalloc char[5] { 'H', 'e', 'l', 'l', 'o' };
myStruct.SetBuffer(chars);
foreach(ref readonly var x in myStruct.GetBuffer())
{
Console.Write(x);
}
}
}
public static class Program
{
public static void Main()
{
new MyClass().Foo();
}
}
You might get to a point where you write something like this and get an error saying:
Error CS8352: Cannot use variable ‘chars’ in this context because it may expose referenced variables outside of their declaration scope
Error CS8350: This combination of arguments to ‘MyStruct.SetBuffer(ReadOnlySpan
)’ is disallowed because it may expose variables referenced by > parameter ‘inputs’ outside of their declaration scope
Now that’s an interesting message, a careful reader might see a detail that we already discussed previously. Span
and any ref struct
for that matter cannot (must not) escape their declaration scope. C# compiler cannot be sure if the SetBuffer
method won’t do any weird stuff with the inputs
argument — mainly sending to another method or some other voodoo that could cause inputs to end up on the heap. To reassure C# that we’ll not be doing that and that we only want to do some operations with stackallock
, add scoped
keyword besides ReadOnlySpan
i.e. public void SetBuffer(scoped ReadonlySpan inputs)
.
And finally, one last thing to talk about: ValueTask[<T>]
!
Ah the beautiful asynchronous programming, lets us have fun and make money 😎. But what happens if the thing that puts food on our table suddenly starts giving us headaches? Consider an example:
using System;
using System.Threading.Tasks;
using System.Linq;
public class MyClass
{
public async Task<int> CalculateAsync()
{
var retResult = 0;
foreach(var item in Enumerable.Range(1, 10000))
{
retResult += await GetNextAsync(item);
}
return retResult;
}
private async Task<int> GetNextAsync(int token)
{
if (token % 2 == 0)
{
return await GetInt();
}
return token;
}
private async Task<int> GetInt()
{
await Task.Yield();
return Random.Shared.Next(1, 100);
}
}
public static class Program
{
public static async Task Main()
{
var calculatedResult = await new MyClass().CalculateAsync();
}
}
Here we have a piece of code that performs 10000 additions, half of which are completed synchronously and the other half that are completed asynchronously. In both of these scenarios, C# has to create an instance of a Task
which can make a strain on the GC on the code hot paths like this (in a real-world scenario where there would be much more iterations and/or work done in the GetInt
method). Obviously, half of the time the GetNextAsync
method completes synchronously but even though it does, Task
instance is still created for it. To reduce such wasteful behavior ValueTask[<T>]
was introduced.
ValueTask<T>
is a struct that wraps T
, Task
or IValueTaskSource
. There are a few rules that should be followed if one tries to use a ValueTask[GetNextAsync
can now be changed to:
private async ValueTask<int> GetNextAsync(int token)
So why should you try using this instead of the classic Task<T>
? ValueTask<T>
gives the best of both worlds, it can significantly lower the memory footprint of a subpart of your application (or whole) if the methods that are likely to complete synchronously are rewritten to return an instance of ValueTask<T>
. Here’s a quote from Stephen Toub’s blog post.
This means it [ValueTask] can be returned from an async method, and if that method completes synchronously and successfully, nothing need be allocated: we can simply initialize this
ValueTask<TResult>
struct with theTResult
and return that. Only if the method completes asynchronously does aTask<TResult>
need to be allocated, with theValueTask<TResult>
created to wrap that instance…
The above change basically made CalculateAsync
allocate less heap memory — let’s see how much. I used the BenchmarkDotNet library to benchmark two variants of the same code. One of which returns a ValueTask<T>
and the other a Task<T>
. Benchmark code — if you’d like to try it out yourself. Here are the results:
It’s clear from “1 in 2 async Task
vs ValueTask
test” that replacing a Task
with a ValueTask
is a bad idea if you have a method that is only occasionally being completed synchronously. Differences in Allocated, Gen0, and Mean columns are just not good enough to introduce the change even though ValueTask
method did finish earlier and allocated less heap memory. Why? Because switching return value type is a braking change and it surely will be more complicated to perform an API change than it is to only gain small benefits from it.
But we also have a result where it is really beneficial to make that change. See “1 in 100 async Task
vs ValueTask
test”, the difference is huge in all columns. ValueTask
completed in less time, and allocated only 24KB of memory (and with that had minimal GC attention)! On the other hand, we have the same test but with a Task
: 30x times more heap memory, more GC collections, and as a result, the CPU had less time doing the actual work and more time cleaning up unused resources.
If you’ve come to this part, then you get the general idea of small-scale (or otherwise) optimizations one could make on their code. Let’s briefly summarize the above in the most TL;DR way possible.
Question: How do I optimize my code?
Level 0
-
Optimize [P]LINQ queries, do not
ToList
everything, do not perform multiple operations on an un-enumeratedIEnumerable
, structure your methods and callers so that you utilize deferred execution as much as possible while callingToList
only on common denominators. -
Perform heavy
string
concatenation only with theStringBuilder
, string interpolation (which internally usesStringBuilderCache
for formatting andSpan<char>
for the storage — link), andString.Join
. -
In the rarest of cases use
string.Intern
to create a global reference to a string of your choosing. Each next access to string with the same characters as the interned one will be a lot faster. Use it only if you are doing a really large number of string operations. The downside of this is that interned strings go straight to the Gen2 heap and with that are practically never cleared. -
If you are really desperate at this point and want another option, replace all your
foreach
forfor
. Why?foreach
is lowered to get theEnumerator
of yourIEnumerable
which introduces more allocations. Example. -
Use foreach on a
Span<T>
that you can get from aList<T>
with the help of theCollectionsMarshal.AsSpan
method. Use the last thing only if you are reading from a list. -
Do stuff more sequentially; as we’ve seen async methods will in the end allocate a significant amount of memory on a code hot path.
-
Cache stuff in a
Dictionary
withLinq.ToDictionary
. -
Use
MemoryCache
or implement your ownIMemoryCache
, read more about it here
Level 1
-
Use
struct
andrecord struct
(with their respectivereadonly
addons) on code hot paths. Watch out for boxing, do not use string interpolation withstruct
, do not capture them in a closure, and do not assign them to an interface/object. Do not use mutable values inside an immutable context. If you are using a really small POCOstruct
do not bother addingref
to it — copy is cheap for it. Do you really need a truck size of a POCO object? Split it into multiple smaller struct‘s and use it that way — JIT will be happy. Or if you cannot avoid large objects, send them by reference and useref local
variables more often. Most important thing: start small and add more struct only if you really need to. Maintaining such code will become a nightmare if too many structs are introduced, this will be more prominent once you start addingref struct
. -
Use rentable data structures where you can. Link.
Level 2
-
se
ref struct
,stackalloc
,[ReadOnly]Span<T>
inside LINQ callbacks, or methods in which you are sure that one day the client won’t be changing the logic altogether. The moreref struct
you have fewer allocations will happen and with that CPU will have more time to do work. -
Perform a benchmark or annalize which methods are in your code hot path and change these to
ValueTask<T>
if they do finish sequentially most of the time. Ideally, it would be best to switch everything toValueTask<T>
because it essentially gives you no allocation when it can andTask<T>
when it cannot. Though that would incur a potential API break. -
If you use lots of low-level math on your project consider
System.Numerics
(specificallyVector<T>
)
Easier said than done, though most of the time you won’t be going higher than Level 0.
Hope you learned something from this post today, and that it might help you in your future coding endeavors. Cheers!