Pages

Tuesday, January 15, 2013

Ignore casing and diacritics on LINQ queries

When working with LINQ to objects, one could simply call methods to remove diacritics (accentuation) and ignore casing on LINQ queries. Since everything will run on .NET, the methods should resolve on the collections, and everything just works fine.

First of all, we'll need the following extension method:
public static string RemoveDiacritics(this String s)
{
    String normalizedString = s.Normalize(NormalizationForm.FormD);
    StringBuilder stringBuilder = new StringBuilder();

    for (int i = 0; i < normalizedString.Length; i++)
    {
        Char c = normalizedString[i];
        if (CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
            stringBuilder.Append(c);
    }
     return stringBuilder.ToString();
}
Now if you're working with LINQ to objects you could simply write LINQ queries like this:
var result = from p in People
             where p.Name.ToUpper().RemoveDiacritics().Contains(filter.ToUpper())
             select p;
But when you get to play with LINQ to SQL things get a little harder. LINQ to SQL will try to convert the query you've written into valid SQL statements, to run with SQL optimizations, on the database. It translates your query into SQL statements and executes them on the database. The result is then translated back into objects and sent back to your code so you can work with them. So when you work with LINQ to SQL, you either need to have a stored procedure declared on your database that the SQL statement generated can call, or you can "adjust" the database to your needs. I'd go with the second, by changing the column (or the entire table or database) collation, to one that ignores diacritics, or accentuation. For the above example, this could be done with the following statement:
ALTER TABLE People ALTER COLUMN Name [varchar](100) COLLATE SQL_Latin1_General_CP1_CI_AI
This should save you the trouble of developing a stored procedure to do that. Then, we could just use our filter in our queries:
var result = from p in People
             where p.Name == filter
             select p;
A question that I asked on StackOverflow motivated me to blog about this. It can be found here.

No comments:

Post a Comment