Collate in SQL Server

Collate specifies the set of rules that are used during various comparisons, sorting etc. For example it specifies whether the comparisons should be case sensitive or not etc.


Let's take a look at how different types of collate works, for that let the data in our table be:-
FirstnameLastname
JillSmith
EveJackson
jillWilliam
JillJackson
eveSmith
johnDoe

  1. Case Sensitive :-

    Let us consider the below query :-
    .
    SELECT * FROM T1 ORDER BY Firstname  COLLATE SQL_Latin1_General_CP1_CS_AS
    
    The result of this query is
    FirstnameLastname
    eveSmith
    EveJackson
    jillWilliam
    JillJackson
    JillSmith
    johnDoe

    In the above query the CS in "SQL_Latin1_General_CP1_CS_AS" stands for case sensitivity, however note that the case matters only if there is a conflict between two or more elements otherwise it sorts in normal alphabetical order. The real difference is noticed in case of comparison for example notice the result of below query :-
    .
    SELECT * FROM T1 where Firstname = 'jill'  COLLATE SQL_Latin1_General_CP1_CS_AS
    

    FirstnameLastname
    jillWilliam

    You can notice that the above comparison is case sensitive, since originally we had three records whose Firstname is jill (or Jill) but as we ran the query with case sensitive collate therefore only one record with Firstname 'jill' is selected.

    To get the sorting in actual case sensitive manner you need to use binary collate, for example consider the query :-
    .
    SELECT * FROM T1 order by Firstname  COLLATE Latin1_General_bin
    

    The result of above query is
    FirstnameLastname
    EveJackson
    JillJackson
    JillSmith
    eveSmith
    jillWilliam
    johnDoe

    In binary collate upper case letters and lower case letters are treated as separate characters therefore all the upper case letters comes before lower case letters when sorted in ascending order.

  2. Case Insensitive :-

    Consider the query :-
    .
    SELECT * FROM T1 ORDER BY Firstname COLLATE SQL_Latin1_General_CP1_CI_AS
    
    The result of above query is
    FirstnameLastname
    EveJackson
    eveSmith
    jillWilliam
    JillJackson
    JillSmith
    johnDoe

    As you can see the sorting is irrespective of case. Also to notice case insensitive comparison checkout the result of below query:-
    .
    SELECT * FROM T1 where Firstname = 'jill'  COLLATE SQL_Latin1_General_CP1_CI_AS
     

    FirstnameLastname
    JillSmith
    jillWilliam
    JillJackson

    In the above query the collation is case insensitive that is why "jill" and "Jill" are treated as same and three records are selected instead of just one.
  • Different sections in collate "SQL_Latin1_General_CP1_CI_AS" are explained as below:-
    1. latin1 makes the server treat strings using charset latin 1, basically ascii
    2. CI case insensitive comparisons so 'ABC' would equal 'abc'
    3. AS accent sensitive, so 'ΓΌ' does not equal 'u'
  • Get collation of SQL Server : To get the collation that is set on SQL server use the query :-
    .
    select serverproperty('collation')

  • To get collation of a specific database on SQL server use the query :-
    .
    select databasepropertyex('MY_DB','collation')
    



No comments:

Post a Comment