Speed up Regex performance with .NET 5
- 8 minutes read - 1567 words.NET 5 Preview 1 was recently released and one of the improvements has been in Regex performance. And Although Microsoft have said they will do a deep dive on Regex performance soon (edit: Microsoft’s deep dive), I thought doing some preliminary benchmarks might be a nice way to test out Preview 1.
Disclaimer: these are microbenchmarks done outside of a lab, on a small dev machine, so take them with a grain of salt.
Installing .NET 5 Preview 1
You can download .NET 5 Preview 1 from the .NET Downloads page. As I’m on a Mac, I downloaded and ran the x64 installer.
Upon installing, it was nice to see the branding has been updated to remove “Core”:
Let’s do a quick check to see if it installed correctly:
$ dotnet --info
.NET Core SDK (reflecting any global.json):
Version: 5.0.100-preview.1.20155.7
Commit: 1c44853056
Runtime Environment:
OS Name: Mac OS X
OS Version: 10.14
OS Platform: Darwin
RID: osx.10.14-x64
Base Path: /usr/local/share/dotnet/sdk/5.0.100-preview.1.20155.7/
Host (useful for support):
Version: 5.0.0-preview.1.20120.5
Commit: 3c523a6a7a
.NET Core SDKs installed:
3.1.102 [/usr/local/share/dotnet/sdk]
5.0.100-preview.1.20155.7 [/usr/local/share/dotnet/sdk]
.NET Core runtimes installed:
Microsoft.AspNetCore.App 3.1.2 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 5.0.0-preview.1.20124.5 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 3.1.2 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 5.0.0-preview.1.20120.5 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
To install additional .NET Core runtimes or SDKs:
https://aka.ms/dotnet-download
Looks like the CLI still think’s it’s “Core”. I guess there’s a little ways to go yet to update all the branding. But, it installed correctly which is a good start!
Anyway, let’s get on with the benchmarking. So the idea here is to come up with a some decent Regex patterns and run them against .NET Core 3.1 and .NET 5.0 and compare the performance (time, allocations etc). So let’s find some complex Regex patterns!
Choosing Regex patterns to test
Searching on Google and GitHub, I quickly came across the repo mariomka/regex-benchmark which tests three Regex patterns against multiple languages.
The benchmarks they test are finding the following information in some text:
- Email:
[\w\.+-]+@[\w\.-]+\.[\w\.-]+
- URI:
[\w]+://[^/\s?#]+[^\s?#]+(?:\?[^\s#]*)?(?:#[^\s]*)?
- IP:
(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9])
Yes: these may not be the most efficient patterns for their respective tasks, but this is about having something to benchmark so it’s ok.
The input data to apply the Regex patterns to can be found here.
Great. Now that we have our Regex patterns to test, let’s create the benchmark runner.
Creating the benchmark runner
We could use System.Diagnostics.StopWatch
in a console app to run the benchmarks (intuitively what most of us might do, and what the repo above does), but the .NET Team has developed, and uses themselves, BenchmarkDotNet because:
BenchmarkDotNet helps you to transform methods into benchmarks, track their performance, and share reproducible measurement experiments. It’s no harder than writing unit tests! Under the hood, it performs a lot of magic that guarantees reliable and precise results. BenchmarkDotNet protects you from popular benchmarking mistakes and warns you if something is wrong with your benchmark design or obtained measurements. The results are presented in a user-friendly form that highlights all the important facts about your experiment. The library is adopted by 3500+ projects including .NET Core and supported by the .NET Foundation.
So using BenchmarkDotNet, we can get more consistent and repeatable results, with nice reports as well - sounds good to me!
Create a console app
Before creating the benchmarks, we need something for them to run in. Typically this is a console app, so let’s create one:
$ dotnet new console -n DotNet5RegexBenchmark
This will create a blank console app named “DotNet5RegexBenchmark” with an empty main method:
using System;
namespace DotNet5RegexBenchmark
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Hello World!");
}
}
}
Now we can start to add the benchmark code.
Add BenchmarkDotNet package
Let’s add BenchmarkDotNet to the project. It is supplied as either a Nuget package or a .NET Tool. I chose to use the Nuget package in this example:
$ cd DotNet5RegexBenchmark
$ dotnet add package BenchmarkDotNet
We also need to ensure the input file is copied to the output directory so we can find it at runtime.
The project file DotNetRegexBenchmark.csproj
should now look like this. Note the TargetFramework
version is netcoreapp3.1
, BenchmarkDotNet has been added as a package and we are copying the input file to the output directory:
<Project Sdk="Microsoft.NET.Sdk">
<PropertyGroup>
<OutputType>Exe</OutputType>
<TargetFramework>netcoreapp3.1</TargetFramework>
</PropertyGroup>
<ItemGroup>
<PackageReference Include="BenchmarkDotNet" Version="0.12.0" />
</ItemGroup>
<ItemGroup>
<None Update="input-text.txt">
<CopyToOutputDirectory>PreserveNewest</CopyToOutputDirectory>
</None>
</ItemGroup>
</Project>
Ok, now that our console app is created, let’s add the benchmarks.
Add the benchmarks
The canonical benchmark example looks like this:
[SimpleJob(RuntimeMoniker.Net472, baseline: true)]
[SimpleJob(RuntimeMoniker.NetCoreApp30)]
[SimpleJob(RuntimeMoniker.CoreRt30)]
[SimpleJob(RuntimeMoniker.Mono)]
[RPlotExporter]
public class Md5VsSha256
{
private SHA256 sha256 = SHA256.Create();
private MD5 md5 = MD5.Create();
private byte[] data;
[Params(1000, 10000)]
public int N;
[GlobalSetup]
public void Setup()
{
data = new byte[N];
new Random(42).NextBytes(data);
}
[Benchmark]
public byte[] Sha256() => sha256.ComputeHash(data);
[Benchmark]
public byte[] Md5() => md5.ComputeHash(data);
}
Without reading the full BenchmarkDotNet docs (which I would advise for some bed time reading), we can see from the example that there are a few key things to setting up a benchmark.
Class definition
Firstly, we should encapsulate the benchmarks in a class. Let’s create ours:
namespace DotNet5RegexBenchmark
{
public class RegexBenchmarks
{
}
}
Right, not much going on yet so let’s add the setup.
Setup
The Setup
is where you want to do anything that’s required for each benchmark to run like new-ing up any data. For these benchmarks, we need to pull in the text file that we’re applying the Regex patterns to, so this is a great place for doing that:
using System.IO;
using BenchmarkDotNet.Attributes;
namespace DotNet5RegexBenchmark
{
public class RegexBenchmarks
{
private string _data;
[GlobalSetup]
public void Setup()
{
_data = File.ReadAllText("input-text.txt");
}
}
}
Also note that you need to add the [GlobalSetup]
attribute to your setup method to tell BenchmarkDotNet that this is your setup method, and it will be called before running the benchmarks.
Now that the setup is done, let’s add the benchmarks.
Benchmarks
The best way to manage benchmarks is by creating a method for each one and applying the [Benchmark]
attribute to tell BenchmarkDotNet that this method is a Benchmark. So let’s add the three benchmarks:
using System.IO;
using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
namespace DotNet5RegexBenchmark
{
public class RegexBenchmarks
{
private string _data;
[GlobalSetup]
public void Setup()
{
_data = File.ReadAllText("input-text.txt");
}
[Benchmark]
public int Email() => Regex.Matches(_data, @"[\w\.+-]+@[\w\.-]+\.[\w\.-]+", RegexOptions.Compiled).Count;
[Benchmark]
public int URI() => Regex.Matches(_data, @"[\w]+://[^/\s?#]+[^\s?#]+(?:\?[^\s#]*)?(?:#[^\s]*)?", RegexOptions.Compiled).Count;
[Benchmark]
public int IP() => Regex.Matches(_data, @"(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9])", RegexOptions.Compiled).Count;
}
}
Edit: As pointed out by Stephen Toub, the Regex matching is lazy so you need to actually access the matches to run the matching code. My initial benchmarks were not doing this, as to why I was not seeing the 3-6x speedup. I’ve now included a
.Count
operation to actually run all of the matches.
Ok, looks good, almost there. Now let’s add the runtimes to target.
Runtimes
The plan is to compare .NET Core 3.1 and .NET 5.0, so let’s do that. This can be easily done with some more attributes on the class. Let’s also add the attribute to show memory usage as well as it’s not turned on by default:
using System.IO;
using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Jobs;
namespace DotNet5RegexBenchmark
{
[SimpleJob(RuntimeMoniker.NetCoreApp31, baseline: true)]
[SimpleJob(RuntimeMoniker.NetCoreApp50)]
[MemoryDiagnoser]
public class RegexBenchmarks
{
private string _data;
[GlobalSetup]
public void Setup()
{
_data = File.ReadAllText("input-text.txt");
}
[Benchmark]
public int Email() => Regex.Matches(_data, @"[\w\.+-]+@[\w\.-]+\.[\w\.-]+", RegexOptions.Compiled).Count;
[Benchmark]
public int URI() => Regex.Matches(_data, @"[\w]+://[^/\s?#]+[^\s?#]+(?:\?[^\s#]*)?(?:#[^\s]*)?", RegexOptions.Compiled).Count;
[Benchmark]
public int IP() => Regex.Matches(_data, @"(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9])\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9])", RegexOptions.Compiled).Count;
}
}
Ok, I think the class is done. Let’s now hook it up to the Program.
Program
We just need to call the benchmark runner on our class from the main program, and we should be good to go:
using BenchmarkDotNet.Running;
namespace DotNet5RegexBenchmark
{
class Program
{
static void Main(string[] args)
{
BenchmarkRunner.Run<RegexBenchmarks>();
}
}
}
Let’s run this sucker!
Benchmark results
I’m running this in Release configuration with the following command (elevated privelages are required):
$ sudo dotnet run -c Release
and here are the results:
Method | Runtime | Mean | Error | StdDev | Ratio | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
.NET Core 3.1 | 1,357.990 ms | 28.9406 ms | 33.3280 ms | 1.00 | - | - | - | 20.99 KB | |
.NET Core 5.0 | 263.536 ms | 1.5500 ms | 1.2943 ms | 0.19 | - | - | - | 23.28 KB | |
URI | .NET Core 3.1 | 1,138.896 ms | 5.4737 ms | 4.5708 ms | 1.00 | - | - | - | 1205.21 KB |
URI | .NET Core 5.0 | 228.324 ms | 1.3113 ms | 1.2266 ms | 0.20 | - | - | - | 1205.1 KB |
IP | .NET Core 3.1 | 94.695 ms | 1.2128 ms | 1.1344 ms | 1.00 | - | - | - | 1.36 KB |
IP | .NET Core 5.0 | 9.411 ms | 0.0423 ms | 0.0395 ms | 0.10 | - | - | - | 1.24 KB |
So the .NET Team weren’t lying; .NET 5 Regex is definitely faster and cheaper; between 5-10x faster for these benchmarks! It will be nice to see when the .NET Team come out with their own benchmarks as they’ll likely be a lot more detailed and scientific than mine, but the future is looking fast!
Summary
The first preview of .NET 5 was recently released and Regex performance was improved. I tested this with BenchmarkDotNet against a few benchmarks and this is definitely the case - between 5-10x faster. The .NET Team will likely release their own statistics on Regex performance as the release firms up. This is only the first preview of .NET 5 so expect more performance improvements in the next previews.