Wednesday, July 31, 2013

So, I ran into an interesting problem the other day.

I've been part of building HR Auditor for Microsoft Lync these last couple of months. Now, when you turn on archiving, Lync starts dumping all the conversations from its platform into a SQL database. HR Auditor is a front end platform that lets users search for that conversation by keyword, user, and time, or some combination thereof. And that's good, because on a medium to large company, you can end up with forty thousand distinct conversations in a few days -partly because every time you close a window, Lync counts it as you closing your side of a conversation, which writes it out to the logs.

So after I hammered out the installation for Windows Server 2008 and WS 2012 (Using the VS 2010 installer project. Not...not my favorite thing ever), I switched out onto SQL side of things. Now, most of my experience has been with raw SQL, so I was/am a LINQ to SQL newbie. All the extant SQL behind the searching was in LINQ to SQL, so I caught up and tried to find how to implement case insensitive search.

As I found out, by itself LINQ isn't concerned with case. That is to say, it doesn't have a lot of tools for dealing with it, and can't be set to ignore it on a high level. What LINQ does is follow the collation set by SQL database. And Lync (as separate from LINQ, and links...yeah, my office doesn't get confusing at all) sets up LCSLOG, where it stores its conversation logging data, with Latin1_General_BIN as the collation. Latin1_General_BIN is --you guessed it-- case sensitive.

Okay, okay, not a problem, just say (s=>s.UPPER(Body)) and you'll be fine, right? Normally yes, and that was my first impulse. Except I was foiled by Lync again, because all body fields are encoded as NTEXT, which will not accept casting like that.

The person who had been working on the problem before me had tried to solve the problem by dynamically altering the table collation, which worked...and also built up enough logging records to crash our SQL server.
I decided not to do that.

There's a translate method to be had, though. What that does is take an IQueryable (hang on; I'll define it below) and build an IEnumerable out of it, which is what I get back from LINQ in the first place, so I want that back from my modified method, too.

Now, first we have to get the SQL command that LINQ generates, and modify it, which is pretty easy, since it comes back in a string like this:

String dbCommandText = yourDataContext.GetCommand(db.LINQDataModel.Where(Your LINQ ));

To use translate, you've got to have an IQueryable, and an open connection. So open a connection with:

if(System.Data.ConnectionState.Closed == db.Connection.State){ db.Connection.Open();}

So now you've got this string and an open connection. What do you do with it? Well, my need (and probably yours) was to edit the WHERE clause. I ended up doing mine with a regular expression, which was as fun as it sounds. My pattern was pretty simple, it looked like this:

var pattern = Regex.Escape("[") + "t.*" + Regex.Escape("]") + Regex.Escape(".") + Regex.Escape("[") + "Body" + Regex.Escape("]");

That's "find [t(any number of any character)].[Body]". I replaced all the matches with

UPPER(CONVERT(NVARCHAR(MAX), [t(original value)].[Body]))

Which is exactly what I want to have. I've already converted my keyword to upper case (did that in C# before I even passed down the parameter), so now any instance of "cAmEl CaSe" will match "CAMEL CASE" or "camel case" or any mix thereof.

At this point, I've got exactly the SQL command I want to execute, so I've got to get my IQueryable up and running, which we do

Now that you're getting your results back, and they're going into your customer data structure, what you do with your modified text is execute it and then pass the results right back to your handler, just as if you'd called it with straight LINQ. It works like this:

                using (var myReader = mySessions.ExecuteReader())
                    var results = db.Translate(myReader);
                    myCustomDataStructureInstance = results.ToList();     

And there are two parts to that magic. The first is the reader. That's a lot like using System.IO.File.ReadAllText. Only where ReadAllText gets all the text in a file, ExecuteReader gets all the results from the SQL query and packs it into your custom data structure.
"results" is going to be whatever enumerable type you've declared (usually something like a dictionary, for me), and it's actually getting populated when you make the call to Translate, which is what passes the LINQ in.
For my program, there was a little more transformation required before I passed things home, but it worked out.

Let me know how it works for you!

Michael McPherson