Return-Path: Delivered-To: apmail-db-derby-dev-archive@www.apache.org Received: (qmail 25345 invoked from network); 23 Mar 2007 16:53:58 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 Mar 2007 16:53:58 -0000 Received: (qmail 31224 invoked by uid 500); 23 Mar 2007 16:54:05 -0000 Delivered-To: apmail-db-derby-dev-archive@db.apache.org Received: (qmail 31187 invoked by uid 500); 23 Mar 2007 16:54:05 -0000 Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: Delivered-To: mailing list derby-dev@db.apache.org Received: (qmail 31177 invoked by uid 99); 23 Mar 2007 16:54:05 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Mar 2007 09:54:05 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of msatoor@gmail.com designates 209.85.132.251 as permitted sender) Received: from [209.85.132.251] (HELO an-out-0708.google.com) (209.85.132.251) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 23 Mar 2007 09:53:55 -0700 Received: by an-out-0708.google.com with SMTP id c25so1255927ana for ; Fri, 23 Mar 2007 09:53:34 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=j/pvBpLOHz7/55YYbwSAa09MB8VGPdpnWb4kY0xzzKWWn3uyEWXBe0nGVb90ix1MB5pRcmBjEwA6c5p7AzUk12bYC7Ed+bF2i7F/XPBs1I1EIfZ5Z3dW/td2/NqoWSuYvcqkJ7WwEfzN9vu40BcMUgpLEvKOhdrL/21AHjoX9uw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=CdrAGcICHKcDECN14kFLfc69yP6SCn+adSTtL44UZp/whXg5uGAIoVQE33Je+4qDYE8iW/aIMbBi5UizrirP8+iZuMRs5mVCH21ufdbljEdOqtIV23eyVDsuoxV8pq1JBonxlQ99l7zmt8TWsK+wQX9bsYKqnzAOcJ9isMq3Kb8= Received: by 10.100.43.9 with SMTP id q9mr2831351anq.1174668814009; Fri, 23 Mar 2007 09:53:34 -0700 (PDT) Received: by 10.100.136.4 with HTTP; Fri, 23 Mar 2007 09:53:33 -0700 (PDT) Message-ID: Date: Fri, 23 Mar 2007 09:53:33 -0700 From: "Mamta Satoor" To: derby-dev@db.apache.org Subject: Re: Collation feature discussion In-Reply-To: <4603DAD0.5010208@sun.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_204008_1135909.1174668813953" References: <45FB6AA9.1030507@amberpoint.com> <45FD5444.4020708@apache.org> <45FEEBBB.501@apache.org> <46035E1E.9020701@apache.org> <4603DAD0.5010208@sun.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_204008_1135909.1174668813953 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline Thanks a bunch for the pointers, Rick. What is intriguing is the default collation of the string data type. For a string literal, if it is used in a comparison operation with SYS schema character column, then logically, the string literal should have collation of UCS_BASIC. But if the string literal is used in comparison with user schema character column, then it's collation should be whatever is defined for that user schema. So, it sounds like the default collation of character set will be different depending on the schema. SQL spec does say that there is a collation descriptor associated with schema descriptor. And may be that is how character set's default collation will be determined. I need to spend more time on SQL spec to understand this completely. Mamta On 3/23/07, Rick Hillegas wrote: > > Hi Mamta, > > This is my understanding of what these words mean, based on a quick > googling of industry practices. For instance, see > > http://www.nocomsoftware.se/p5745/files/whatsnew-sb-10.0.0.htm > http://msdn2.microsoft.com/en-us/library/ms179886.aspx > > Explicit - This means that a COLLATE clause in the statement forces the > server to use a particular collation. > > Implicit - This means that your statement mentions a column without > using a COLLATE clause. The column itself has a collation which was > determined when the table was created. > > None - This case arises when you use SQL operators to combine two > columns which have different collations. For example "select > frenchColumn || englishColumn from ...". In this case the server cannot > figure out which collation to use. > > There is also a concept of a default collation for a datatype. As I read > the SQL standard, part 2, section 4.2.2, I see the following: > > 1) A string datatype has a default character set associated with it. > > 2) That character set, in turn, has a distinguished collation associated > with it. > > 3) That collation is the default collation of the string datatype. That > is, if you create a column of that datatype and you don't include a > COLLATE clause, then the column has that collation. Similarly, if you > declare a function that returns a string datatype and you don't include > a COLLATE clause, then the function returns a string having that default > collation. > > Hope this helps... > > Regards, > -Rick > > > > Mamta Satoor wrote: > > I am looking at the SQL spec to see how it deals with the problem of > > different collation types, which they call as explicit, implicit and > > none. Hopefully, that will make it easier to come up with a logic for > > deducting correct collation type for non-trivial cases like COLLATE, > > TRIM, string literal, etc. > > > > Mamta > > > > > > On 3/22/07, *Daniel John Debrunner* > > wrote: > > > > Mamta Satoor wrote: > > > Before talking about functions, I think it will be better to > > first talk > > > about string literals and their collation determination. > > > > > > SQL spec section 5.3 , Syntax Rule 15) says "The > > declared type > > > collation of a is the character set > > > collation, and the collation derivation is implicit." > > > > > > Based on this, when a string literal (collation type UNKNOWN) is > > getting > > > used in a collation method with another operand as UCS_BASIC > > collation, > > > then the collation type of string literal will be UCS_BASIC. > > Similar > > > rule for operand with TERRITORY_BASED. In a case where, > > collation types > > > of all the operands is UNKNOWN, at collation time, it can be > > assumed to > > > be whatever is defined for user defined character columns. This > > will be > > > similar to the example given by Rick for implicit collation type > > when > > > talking about CAST ie > > > CREATE TABLE t1 (c11 char(1) default 'a') In this example, the > > collation > > > type of DTD associated with 'a' will be implicitly whatever is > > defined > > > at the database level for COLLATION. > > > > > > Hope this answers the question about string literals. > > > > Kind of, I looked up the definition of "collation derivation is > > implicit" in section 4.2.2 of the standard and at first reading it > > wasn't obvious to me what it meant. > > > > I know I suggested the 'collation type UNKNOWN' but I hadn't > > looked into > > the SQL standard in detail, and now I'm wondering if the UNKNOWN > > concept is a good idea. Since the SQL standard already defines a > model > > for how collations are defined it might be wise to follow the > required > > model and naming. Not sure what that would mean exactly, but it > seems > > like each character expression can have a derivation of explicit, > > implicit or none. These may be better ways to carry state rather > than > > unknown. Unless of course there's a clear mapping between unknown > and > > the sql standard definition. > > > > Dan. > > > > > > ------=_Part_204008_1135909.1174668813953 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit Content-Disposition: inline
Thanks a bunch for the pointers, Rick.
 
What is intriguing is the default collation of the string data type. For a string literal, if it is used in a comparison operation with SYS schema character column, then logically, the string literal should have collation of UCS_BASIC. But if the string literal is used in comparison with user schema character column, then it's collation should be whatever is defined for that user schema. So, it sounds like the default collation of character set will be different depending on the schema. SQL spec does say that there is a collation descriptor associated with schema descriptor. And may be that is how character set's default collation will be determined. I need to spend more time on SQL spec to understand this completely.
 
Mamta

 
On 3/23/07, Rick Hillegas <Richard.Hillegas@sun.com > wrote:
Hi Mamta,

This is my understanding of what these words mean, based on a quick
googling of industry practices. For instance, see

http://www.nocomsoftware.se/p5745/files/whatsnew-sb-10.0.0.htm
http://msdn2.microsoft.com/en-us/library/ms179886.aspx

Explicit - This means that a COLLATE clause in the statement forces the
server to use a particular collation.

Implicit - This means that your statement mentions a column without
using a COLLATE clause. The column itself has a collation which was
determined when the table was created.

None - This case arises when you use SQL operators to combine two
columns which have different collations. For example "select
frenchColumn || englishColumn from ...". In this case the server cannot
figure out which collation to use.

There is also a concept of a default collation for a datatype. As I read
the SQL standard, part 2, section 4.2.2, I see the following:

1) A string datatype has a default character set associated with it.

2) That character set, in turn, has a distinguished collation associated
with it.

3) That collation is the default collation of the string datatype. That
is, if you create a column of that datatype and you don't include a
COLLATE clause, then the column has that collation. Similarly, if you
declare a function that returns a string datatype and you don't include
a COLLATE clause, then the function returns a string having that default
collation.

Hope this helps...

Regards,
-Rick



Mamta Satoor wrote:
> I am looking at the SQL spec to see how it deals with the problem of
> different collation types, which they call as explicit, implicit and
> none. Hopefully, that will make it easier to come up with a logic for
> deducting correct collation type for non-trivial cases like COLLATE,
> TRIM, string literal, etc.
>
> Mamta
>
>
> On 3/22/07, *Daniel John Debrunner* <djd@apache.org
> <mailto: djd@apache.org>> wrote:
>
>     Mamta Satoor wrote:
>     > Before talking about functions, I think it will be better to
>     first talk
>     > about string literals and their collation determination.
>     >
>     > SQL spec section 5.3 <literal>, Syntax Rule 15) says "The
>     declared type
>     > collation of a <character string literal> is the character set
>     > collation, and the collation derivation is implicit."
>     >
>     > Based on this, when a string literal (collation type UNKNOWN) is
>     getting
>     > used in a collation method with another operand as UCS_BASIC
>     collation,
>     > then the collation type of string literal will be UCS_BASIC.
>     Similar
>     > rule for operand with TERRITORY_BASED. In a case where,
>     collation types
>     > of all the operands is UNKNOWN, at collation time, it can be
>     assumed to
>     > be whatever is defined for user defined character columns. This
>     will be
>     > similar to the example given by Rick for implicit collation type
>     when
>     > talking about CAST ie
>     > CREATE TABLE t1 (c11 char(1) default 'a') In this example, the
>     collation
>     > type of DTD associated with 'a' will be implicitly whatever is
>     defined
>     > at the database level for COLLATION.
>     >
>     > Hope this answers the question about string literals.
>
>     Kind of, I looked up the definition of "collation derivation is
>     implicit" in section 4.2.2 of the standard and at first reading it
>     wasn't obvious to me what it meant.
>
>     I know I suggested the 'collation type UNKNOWN' but I hadn't
>     looked into
>     the SQL standard in detail, and now I'm wondering if the UNKNOWN
>     concept is a good idea. Since the SQL standard already defines a model
>     for how collations are defined it might be wise to follow the required
>     model and naming. Not sure what that would mean exactly, but it seems
>     like each character expression can have a derivation of explicit,
>     implicit or none. These may be better ways to carry state rather than
>     unknown. Unless of course there's a clear mapping between unknown and
>     the sql standard definition.
>
>     Dan.
>
>


------=_Part_204008_1135909.1174668813953--