Tuesday, 15 May 2012

user defined functions - SQL Server: How can we call the same UDF with different collations -



user defined functions - SQL Server: How can we call the same UDF with different collations -

i'm writing user defined function (udf) in sql server 2008 r2. i'd have udf utilize different collations on different calls. possible? possible dynamically alter collation based upon user input?

from intellisense, little can find online, seems collation must specified in actual t-sql code , cannot applied variable or input parameter. there ways around this, ways can work within udf?

edit: requested, here udf i'm working with. intent phone call both , without "xxxx sensitive" settings sorting query's results "relevance".

class="lang-sql prettyprint-override">/* t-sql implementation of levenshtein edit distance. */ create function [dbo].[levenshteindistancesql] ( @texta varchar(max) , @textb varchar(max) ) returns int begin declare @alength int , @blength int , @prevsize int , @currentsize int , @previousdistances varbinary(max) , @currentdistances varbinary(max) , @result int set @alength = len(@texta) set @blength = len(@textb) set @result = null -- degenerate cases -- want calling routine specify collation of comparing if (@texta = @textb) set @result = 0; if (@alength = 0) set @result = @blength; if (@blength = 0) set @result = @alength; if (@result null) begin -- set sizes of tables length of largest input text, plus 1 set @prevsize = case when @alength > @blength @alength else @blength end + 1; set @currentsize = case when @alength > @blength @alength else @blength end + 1; -- initialize previous distances declare @i int , @j int , @cost int , @curdistlastcost int , @prevdistcurcost int , @prevdistlastcost int , @chara char(1) , @charb char(1) set @i = 0 set @previousdistances = 0x -- empty varbinary while (@i < @prevsize) begin set @previousdistances = @previousdistances + convert(binary(4), @i) set @i = @i + 1; end -- process @texta set @i = 0 while (@i < @alength) begin set @currentdistances = convert(binary(4), @i + 1); -- process @textb set @j = 0 while (@j < @blength) begin set @chara = substring(@texta, @i + 1, 1) set @charb = substring(@textb, @j + 1, 1) -- want calling routine specify collation of comparing set @cost = case when @chara = @charb 0 else 1 end; set @curdistlastcost = convert(int, substring(@currentdistances, @j * 4 + 1, 4)) set @prevdistcurcost = convert(int, substring(@previousdistances, (@j + 1) * 4 + 1, 4)) set @prevdistlastcost = convert(int, substring(@previousdistances, @j * 4 + 1, 4)) set @currentdistances = @currentdistances + convert(binary(4), case when @curdistlastcost < @prevdistcurcost , @curdistlastcost < @prevdistlastcost @curdistlastcost + 1 when @prevdistcurcost < @curdistlastcost , @prevdistcurcost < @prevdistlastcost @prevdistcurcost + 1 else @prevdistlastcost + @cost end) set @j = @j + 1 end -- re-create current distances previous distances next iteration set @previousdistances = @currentdistances set @i = @i + 1; end set @result = convert(int, substring(@currentdistances, @blength * 4 + 1, 4)) end homecoming @result end

called this:

class="lang-sql prettyprint-override">declare @text varchar(50) set @text = 'c' select * skills s s.name '%' + @text + '%' order dbo.levenshteindistance(@text, s.name) -- collate latin1_general_100_ci_ai , dbo.levenshteindistance(@text, s.name) -- collate latin1_general_100_cs_as_ks_ws , s.name

being able specify collation allow me sort shortest edit-distance (insensitive) top of results; , within insensitive edit distance case, accent, etc. matches @ top.

there 2 ways accomplish this, , both require sqlclr. have seen, t-sql not allow specifying collation dynamically, in .net can :). , in fact, there 2 additional benefits using sqlclr purpose:

you can more granular command on options of doing comparison. @ compareoptions enum. example, has alternative ignoresymbols:

indicates string comparing must ignore symbols, such white-space characters, punctuation, currency symbols, percent sign, mathematical symbols, ampersand, , on.

sqlclr udfs can allow parallel execution plan, whereas t-sql udfs disallow parallel plans. in order allow parallel plan, sqlclr udf needs deterministic and marked isdeterministic=true not info access (no need mark explicitly default).

that said, here additional notes:

all implementations of levenshtein distance algorithm have seen assume ordinal comparing (i.e. of *_bin collations). suppose trying accomplish prove interesting :)

given trying accomplish, when taking sqlclr approach, don't need pass in collation info. reason each particular language (i.e. collation) controls both comparing options equality sorting. sorting not involved in levenshtein distance algorithm. reduces need 2 variations:

fully insensitive (at to the lowest degree ignorecase , ignorenonspace, , optionally ignorekanatype, ignoresymbols, , ignorewidth)

fully sensitive (i.e. ordinal)

so, 2 options are:

do in .net:

implement algorithm in c# (or vb.net) , utilize compareinfo.compare (string, string, compareoptions) method. in scenario need either 2 functions, 1 "sensitive" , 1 "insensitive", or single function both takes input parameter (sqlboolean, perhaps) @sensitivecomparison. particular language of strings compared irrelevant technically back upwards languages on day 1 :).

keep algorithm in t-sql. in scenario execute sqlcommand using connection string of "context connection = true;". function take in sqlstring of collation name , utilize concatenate algorithm in 2 spots comparison. [i don't see much benefit alternative on c# option, , in fact should little slower .net code should faster type of thing.]

in either case, should noted can accomplished in assembly marked permission_set = safe. meaning, there absolutely no security risk in implementing functionality (as described here). and, still under (false) impression sqlclr / enabling clr integration inherently security risk, wrote article (part of series on sqlclr) covers topic of security various angles , plenty of examples test yourself: stairway sqlclr level 3: security (general , safe assemblies) (free registration required).

sql-server-2008-r2 user-defined-functions collation

No comments:

Post a Comment