Wednesday, January 14, 2009

Fast byte array comparison in C#

I got into a discussion with a colleague the other day about string comparison in .Net and whether to use

variable.Equals("mystring")

or

"string" == "string"

both in terms of speed (though it wouldn’t matter in most cases) and in terms of readability. As for speed .Equals is faster as you save one method call. == is implemented as an operator which again calls Equals. Our good friend Reflector is always there when you need him.


The interesting part came when reflecting this and I stumbled upon EqualsHelper and CompareOrdinalHelper. Here .Net casts the strings to pointer arrays and compares an int at a time. This lead me to creating a byte[] comparison function after the same code .Net used internally and benchmarking it.


For an equal array with 11 elements the unsafe is 3 times as fast. For unequal arrays the managed implementation is quicker if the first or second byte differs. From the third on and out the unsafe gains speed. Below is some sample code you can experiment with yourself. The longer the array, the more you gain on the unsafe version. Be sure to test the code compile in release mode.


Microsoft don’t recommend you using unsafe unless it’s performance critical, but since they use it internally we can as well ;) (But you should have a good reason due to complexity imo) Why they compare 10 bytes at a time is beyond me and I haven’t tested if this is some magic number which yields good results for general cases.


class Program
{
static void Main(string[] args)
{
byte[] a = new byte[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 };
byte[] b = new byte[] { 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 };

Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 30000000; i++)
{
SafeEquals(a, b);
}
sw.Stop();
Console.WriteLine(sw.Elapsed);

sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 30000000; i++)
{
UnSafeEquals(a, b);
}
sw.Stop();
Console.WriteLine(sw.Elapsed);
}

private static bool SafeEquals(byte[] strA, byte[] strB)
{
int length = strA.Length;
if (length != strB.Length)
{
return false;
}
for (int i = 0; i < length; i++)
{
if( strA[i] != strB[i] ) return false;
}
return true;
}

[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail)]
private static unsafe bool UnSafeEquals(byte[] strA, byte[] strB)
{
int length = strA.Length;
if (length != strB.Length)
{
return false;
}
fixed (byte* str = strA)
{
byte* chPtr = str;
fixed (byte* str2 = strB)
{
byte* chPtr2 = str2;
byte* chPtr3 = chPtr;
byte* chPtr4 = chPtr2;
while (length >= 10)
{
if ((((*(((int*)chPtr3)) != *(((int*)chPtr4))) || (*(((int*)(chPtr3 + 2))) != *(((int*)(chPtr4 + 2))))) || ((*(((int*)(chPtr3 + 4))) != *(((int*)(chPtr4 + 4)))) || (*(((int*)(chPtr3 + 6))) != *(((int*)(chPtr4 + 6)))))) || (*(((int*)(chPtr3 + 8))) != *(((int*)(chPtr4 + 8)))))
{
break;
}
chPtr3 += 10;
chPtr4 += 10;
length -= 10;
}
while (length > 0)
{
if (*(((int*)chPtr3)) != *(((int*)chPtr4)))
{
break;
}
chPtr3 += 2;
chPtr4 += 2;
length -= 2;
}
return (length <= 0);
}
}
}
}

5 comments:

  1. hi,

    First of all. Thanks very much for your useful post.

    I just came across your blog and wanted to drop you a note telling you how impressed I was with the information you have posted here.

    Please let me introduce you some info related to this post and I hope that it is useful for .Net community.

    There is a good C# resource site, Have alook

    http://www.csharptalk.com/2009/09/c-array.html
    http://www.csharptalk.com/2009/10/creating-arrays.html

    simi

    ReplyDelete
  2. Seems to be a few issues here
    (1) An int in c# is 4 bytes not 2, so this appears to do at least twice as many comparisons as neccessary
    (2) What if the length is odd, the last comparison will then be comparing potentially uninitialzed values

    ReplyDelete
    Replies
    1. Hi,
      Seems you are right. I actually copied the code from String.EqualsHelper in the framework. Didn't really reflect on the code. I'll look more closely at it when I have the time :)

      Copy/paste might not always be the best approach :)

      Delete
  3. There do exist bugs in the code you provide. Hope the following fixes help.

    public static unsafe bool SequenceEqual(this byte[] thisArray, byte[] array)
    {
    int length = thisArray.Length;
    if (length != array.Length) return false;
    fixed (byte* str = thisArray)
    {
    byte* chPtr = str;
    fixed (byte* str2 = array)
    {
    byte* chPtr2 = str2;
    while (length >= 20)
    {
    if ((((*(((int*)chPtr)) != *(((int*)chPtr2))) ||
    (*(((int*)(chPtr + 4))) != *(((int*)(chPtr2 + 4))))) ||
    ((*(((int*)(chPtr + 8))) != *(((int*)(chPtr2 + 8)))) ||
    (*(((int*)(chPtr + 12))) != *(((int*)(chPtr2 + 12)))))) ||
    (*(((int*)(chPtr + 16))) != *(((int*)(chPtr2 + 16))))) break;

    chPtr += 20;
    chPtr2 += 20;
    length -= 20;
    }

    while (length >= 4)
    {
    if (*(((int*)chPtr)) != *(((int*)chPtr2))) break;
    chPtr += 4;
    chPtr2 += 4;

    length -= 4;
    }

    while (length > 0)
    {
    if(*chPtr != * chPtr2) break;
    chPtr++;
    chPtr2++;
    length--;
    }

    return (length <= 0);
    }
    }
    }

    ReplyDelete
    Replies
    1. Hi,
      While I clearly see the error in the code I ripped from the framework, it's actually not wrong. Meaning it will not produce wrong results.

      But it's clearly not optimal either. The first loop compares the first 12 bytes and not 10 like it should (or 20 as you point out), and does a lot of overlapping comparison due to +=2.

      Same goes for the last loop. It works, but does a lot of overlapping checking.

      The non-optimal code is still there in .NET4 :) Thank you so much for pointing this out. You just made my day and weekend as I love this stuff :D

      Delete