There are plenty of situations that call for distributed locking (concurrency control) to solve various business problems, like controlling access to a database resource, issuing unique invoice numbers and similar. Because distributed locking is commonly tied to complex deployment environments, it can be complex itself. Most of us developers are pragmatists (or at least we try to be), so we tend to solve complex distributed locking problems pragmatically. In most situations that won’t be possible, and I’ll explain a few of the approaches that can be used to make the implementation easier.

Facts

  • Redis is single-threaded
  • Our platform is .NET 4.5.1+, and we need to use libraries that can work with this constraint

Environment

  • A highly scalable cluster of .NET WebAPI applications
  • Redis Master-Slave cluster with Sentinel
  • .NET WebAPI applications running Redis client which is connected to the Redis cluster
  • Database servers as the storage back-end

Task

As an example, we will define two tasks that reside inside the .NET WebAPI:

  • a simple task to generate a sequential number that is unique on the cluster level
  • a complex task to increment the number in the database to be unique on the cluster level (for the sake of this article we will not use database mechanisms to accomplish this)

Our Options

There are many paths we can take to solve this task, from simple .NET locking (not distributed), Redis INCR, Redis acquire lock or RedLock, to more complex solutions that require third-party services. I’ll explain all of these approaches, along with their pros and cons.

.NET C# locking options

.NET has the locking options described below, but note that it is only suitable for non-clustered environments.

  • Lock

Beside ordinary lock, developers often use double-checked locking.

    lock (thisLock)  
    {  
        if (amount > balance)  
        {  
            throw new Exception("Insufficient funds");  
        }  
        balance -= amount;  
    }  

MSDN Lock Statement.

  • Monitor
var obj = new Object();
Monitor.Enter(obj);
try {
   // Code to execute one thread at a time.
}
finally {
   Monitor.Exit(obj);
}

MSDN Monitor Class.

  • ReadWriteSlimLock

MSDN ReaderWriterLockSlim Class.

  • SemaphoreSlim

It’s good to know that SemaphoreSlim works with async/await in .NET.

    private static SemaphoreSlim cacheLock = new SemaphoreSlim(0, 3);
    ...

    public async Task<string> WriteAsync(int key, object value)
    {
        await cacheLock.WaitAsync();
        try
        {
            ...
        }
        finally
        {
            cacheLock.Release();
        }
    }

MSDN SemaphoreSlim Class.

All of the above locking mechanisms are useful, some work with async/await, and some don’t, but their locking “semaphores” are not shared across multiple processes, and therefore they aren’t shared across machines in the cluster. That said, it’s obvious that we can’t use any of the above options for distributed locking and we need to look for other solutions.

Simple increment approach

For this simple task, we can decide to use the centralized increment mechanism. Redis INCR command will provide us with just the thing we need. Remember that Redis is single threaded, so INCR command will be locked across the cluster.

INCR(keyname,1)

To execute the same command in .NET, a very popular library StackExchange.Redis by Marc Gravell is used. It’s very easy to use, and it has support for almost all Redis commands, including LUA scripts.

string key = ...
db.StringIncrement(key);

INCR approach would be a good solution for the simple task where we can use atomic nature of Redis operations, but if we want to acquire a distributed lock in the .NET WebAPI and perform the re-initialization, we need to seek another solution. Let’s move on to complex task to explain some of the alternative options.

Redis Acquire Lock

One of the options is to acquire a lock using the SET resource_name my_random_value NX PX 30000 Redis command SET or SETNX. The value (my_random_value) must be the same across all Redis clients and all lock requests. When using this approach, it’s very important to release the lock in a safe way, as described in the following LUA Script example:

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

The lock can be released only if lock (key) exists, and holds the value that is known only to the client that created the lock. To use this approach in .NET, we need to use the following syntax:

RedisValue token = Environment.MachineName;
if(db.LockTake(key, token, duration)) {
    try {
        // you have the lock do work
    } finally {
        db.LockRelease(key, token);
    }
}

Note: In the case of long-running code (a loop), LockExtend should be called in the middle of the operation.

This approach has it’s drawbacks as described in the Redis documentation about distributed locking. In the case of environment setup like ours (Master-Slave with Sentinel), there is a single point of failure in the architecture that we shouldn’t neglect. Redis replication is asynchronous, and because of synchronization delays, there is an obvious race condition:

  1. Client A acquires the lock in the master.
  2. The master crashes before the write to the key is transmitted to the slave.
  3. The slave gets promoted to master.
  4. Client B acquires the lock to the same resource A already holds a lock for. SAFETY VIOLATION!

RedLock

To overcome the drawbacks of using just Redis acquire lock mechanism, we can consider a RedLock algorithm as the solution to our complex task. The RedLock algorithm assumes that it’s running in Redis multi-master environment (so there is no replication with Slaves involved); that lock acquire and release mechanism from the above approach will be used on all masters; and that acquiring locks on all masters will be tied to time. Some of the features of the algorithm are retry on lock acquire failure, auto-release lock, deadlock-free, good performance (although I have experienced slight performance impacts using .NET implementation of RedLock) and ease of use, at least in the .NET implementation. The most important part of the algorithm, in my opinion, is the fact that it’s strongly tied to time, so your systems need to have time in sync. Let’s get back to solving our complex task using the RedLock.NET.

Configuration

To setup RedLock, we need to provide a list of Redis master servers. Keep in mind that we should set up multi-master Redis environment in order to fully comply with the RedLock algorithm. You can use Master-Slave environment, but in that case, you will still have the same single-point of failure issue as above.

List<RedisLockEndPoint> result = new List<RedisLockEndPoint>();
foreach (var host in configuration.Hosts)
{
    result.Add(new RedisLockEndPoint
    {
        EndPoints =
        {
            new DnsEndPoint(host, 6379)
        },
        ConnectionTimeout = configuration.ConnectionTimeout,
        RedisDatabase = configuration.Database,
        Ssl = configuration.UseSsl,
        Password = configuration.Password
    });
}

We should also define the lock acquire expiry time span, the number of retries to perform in order to acquire the lock, and wait time span that should be used between retries.

    private readonly TimeSpan expirySpan = TimeSpan.FromSeconds(45);
    private readonly TimeSpan retrySpan = TimeSpan.FromSeconds(1);
    private readonly TimeSpan waitSpan = TimeSpan.FromSeconds(15);

Instantiation

Creating RedLock instance is straightforward - RedisLockFactory should be used to create an instance by passing the Redis Master end-points to the constructor, and then CreateAsync method should be called with the time span configuration defined above. The important thing to remember is that distributed lock key should be the same across the whole cluster.

IRedisLockFactory factory = new RedisLockFactory(result);
IRedisLock redisLock = factory.CreateAsync("distributed-lock-key", expirySpan, waitSpan, retrySpan);

The task

We defined the complex task at the beginning of this article - it was to increment the number in the database to be unique on the cluster level. To accomplish this task, we need to acquire a lock on cluster level, then update the database number. With .NET RedLock implementation this is a pretty simple task:

//in order to release the lock, dispose should be called on RedisLock instance
using (IRedisLock redisLock = factory.CreateAsync("distributed-lock-key", expirySpan, waitSpan, retrySpan))
{
    // make sure we got the lock
    if (redisLock.IsAcquired)
    {
        //get the current number from the database
        long? number = (await this.Repository.GetValueAsync<long?>(recordId));
        //increment the number
        if (number.HasValue)
        {
            number++;
        }
        else
        {
            number = 1;
        }
        //update the database record 
        await this.Repository.SetValueAsync(recordId, number);
    }
}

We can see that using RedLock in .NET is easy, but be aware that a lot of distributed locking functionality is happening when your code hits the using (IRedisLock redisLock ... line. At this point, RedLock is trying to acquire the lock with retry and wait mechanisms. To be sure that your code will be safe to execute and that lock is acquired, you need to call the line if (redisLock.IsAcquired). When you call the IsAcquired getter, your code will wait until the lock is acquired, or it will exit after it reaches defined number of retries. In case you hit some issue in your code or e.g. your database call is taking a lot of time, the lock will be automatically released when expiry time is reached.

Under the hood, RedLock will try to acquire the lock on all Redis master servers defined by the configuration in order to overcome the single-point of failure. More information about the exact steps of the RedLock algorithm can be found here.

Note: there is a good analysis made by Martin Kleppmann here that points out few concerns regarding the algorithm and the fact that algorithm is based on time.

LUA Scripts

Many developers are trying to use LUA scripts as their distributed locking solution. LUA scripts are becoming more and more popular as Redis “server-side” scripting language. When we combine the fact that Redis is single-threaded and that LUA scripts are executed on Redis server, not the client, they can be used very effectively when solving simple distributed locking tasks. Note that LUA script solution can’t be applied to our complex task as we can only make changes to Redis values, and not the database records.

Let us see a simple LUA script that can be used to track, for example, the number of requests to your API. We will show .NET based example that uses LUA scripting with StackExchange.Redis library.

private const string script = @"
    local value = redis.call('INCRBY', @key, @count);
    local ttl = redis.call('TTL', @key);
    if ttl == -1 then
        redis.call('EXPIRE', @key, @seconds);
    end
    return value;
";

private const string ClusterTotalCacheKey = "Cluster-Total";

...

//Increment current value by count and set expire time to 5 seconds
var newValue = await Cache.ScriptEvaluateAsync<long>(script,
    new
    {
        key = ClusterTotalCacheKey,
        count = count,
        seconds = 5
    });

This script will be executed on Redis server, and it will guarantee that only one script runs at a time, due to single-threaded nature of Redis. The script will increment the key by specified count, and it will make sure that TTL is not -1 (which means the key will never expire). This way Redis will make sure that the key is reset after 5 seconds, and will start counting again. We can see that LUA scripts can be very handy and easy to use.

Other solutions

We have seen that none of the above solutions is a silver bullet, so you may try some of the more complex ones. I won’t go into details about these, as they are complex and probably involve a lot of configuration and setting the environment that will suit your needs, but I’ll point you to some of them so that you can pick a favorite one.

  • .NET & SQL distributed lock solution
  • ZooKeeper - a centralized service for providing distributed synchronization and similar services
  • NCache - locking in a distributed cache for data consistency
  • Helix - A cluster management framework for partitioned and replicated distributed resources

Summary

If you are trying to find a pragmatic, one size fits all solution - you’re out of luck, there is no such thing as pragmatic distributed locking. It’s always pretty hard to implement locking, so I strongly recommend a good review of project requirements before you choose any of the solutions above. Good luck!

More articles

Your opinion matters!

comments powered by Disqus