Mailing-List: contact derby-dev-help@db.apache.org; run by ezmlm
Precedence: bulk
Reply-To: <derby-dev@db.apache.org>
Received-SPF: pass (herse.apache.org: domain of msatoor@gmail.com designates
 209.85.132.251 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=beta;
        h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references;
        b=CdrAGcICHKcDECN14kFLfc69yP6SCn+adSTtL44UZp/whXg5uGAIoVQE33Je+4qDYE8iW/aIMbBi5UizrirP8+iZuMRs5mVCH21ufdbljEdOqtIV23eyVDsuoxV8pq1JBonxlQ99l7zmt8TWsK+wQX9bsYKqnzAOcJ9isMq3Kb8=
Message-ID: <d9619e4a0703230953m532095d3u7fb9f6587cada21f@mail.gmail.com>
Date: Fri, 23 Mar 2007 09:53:33 -0700
From: "Mamta Satoor" <msatoor@gmail.com>
To: derby-dev@db.apache.org
Subject: Re: Collation feature discussion
In-Reply-To: <4603DAD0.5010208@sun.com>
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_204008_1135909.1174668813953"
References: <45FB6AA9.1030507@amberpoint.com> <45FD5444.4020708@apache.org>
	 <d9619e4a0703191219n50884933n8ce1fdc9c3b7f1bd@mail.gmail.com>
	 <45FEEBBB.501@apache.org>
	 <d9619e4a0703191432n2c4fd174q452e610ad30eca40@mail.gmail.com>
	 <d9619e4a0703220949j5d860041k4261448bdfc87f3b@mail.gmail.com>
	 <d9619e4a0703221149p51dd98e6q144e18bd7a2a372c@mail.gmail.com>
	 <46035E1E.9020701@apache.org>
	 <d9619e4a0703230131n7b30cd7dub14f1c75cb8266e3@mail.gmail.com>
	 <4603DAD0.5010208@sun.com>

------=_Part_204008_1135909.1174668813953
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Thanks a bunch for the pointers, Rick.

What is intriguing is the default collation of the string data type. For a
string literal, if it is used in a comparison operation with SYS schema
character column, then logically, the string literal should have collation
of UCS_BASIC. But if the string literal is used in comparison with user
schema character column, then it's collation should be whatever is defined
for that user schema. So, it sounds like the default collation of character
set will be different depending on the schema. SQL spec does say that there
is a collation descriptor associated with schema descriptor. And may be that
is how character set's default collation will be determined. I need to spend
more time on SQL spec to understand this completely.

Mamta


On 3/23/07, Rick Hillegas <Richard.Hillegas@sun.com> wrote:
>
> Hi Mamta,
>
> This is my understanding of what these words mean, based on a quick
> googling of industry practices. For instance, see
>
> http://www.nocomsoftware.se/p5745/files/whatsnew-sb-10.0.0.htm
> http://msdn2.microsoft.com/en-us/library/ms179886.aspx
>
> Explicit - This means that a COLLATE clause in the statement forces the
> server to use a particular collation.
>
> Implicit - This means that your statement mentions a column without
> using a COLLATE clause. The column itself has a collation which was
> determined when the table was created.
>
> None - This case arises when you use SQL operators to combine two
> columns which have different collations. For example "select
> frenchColumn || englishColumn from ...". In this case the server cannot
> figure out which collation to use.
>
> There is also a concept of a default collation for a datatype. As I read
> the SQL standard, part 2, section 4.2.2, I see the following:
>
> 1) A string datatype has a default character set associated with it.
>
> 2) That character set, in turn, has a distinguished collation associated
> with it.
>
> 3) That collation is the default collation of the string datatype. That
> is, if you create a column of that datatype and you don't include a
> COLLATE clause, then the column has that collation. Similarly, if you
> declare a function that returns a string datatype and you don't include
> a COLLATE clause, then the function returns a string having that default
> collation.
>
> Hope this helps...
>
> Regards,
> -Rick
>
>
>
> Mamta Satoor wrote:
> > I am looking at the SQL spec to see how it deals with the problem of
> > different collation types, which they call as explicit, implicit and
> > none. Hopefully, that will make it easier to come up with a logic for
> > deducting correct collation type for non-trivial cases like COLLATE,
> > TRIM, string literal, etc.
> >
> > Mamta
> >
> >
> > On 3/22/07, *Daniel John Debrunner* <djd@apache.org
> > <mailto:djd@apache.org>> wrote:
> >
> >     Mamta Satoor wrote:
> >     > Before talking about functions, I think it will be better to
> >     first talk
> >     > about string literals and their collation determination.
> >     >
> >     > SQL spec section 5.3 <literal>, Syntax Rule 15) says "The
> >     declared type
> >     > collation of a <character string literal> is the character set
> >     > collation, and the collation derivation is implicit."
> >     >
> >     > Based on this, when a string literal (collation type UNKNOWN) is
> >     getting
> >     > used in a collation method with another operand as UCS_BASIC
> >     collation,
> >     > then the collation type of string literal will be UCS_BASIC.
> >     Similar
> >     > rule for operand with TERRITORY_BASED. In a case where,
> >     collation types
> >     > of all the operands is UNKNOWN, at collation time, it can be
> >     assumed to
> >     > be whatever is defined for user defined character columns. This
> >     will be
> >     > similar to the example given by Rick for implicit collation type
> >     when
> >     > talking about CAST ie
> >     > CREATE TABLE t1 (c11 char(1) default 'a') In this example, the
> >     collation
> >     > type of DTD associated with 'a' will be implicitly whatever is
> >     defined
> >     > at the database level for COLLATION.
> >     >
> >     > Hope this answers the question about string literals.
> >
> >     Kind of, I looked up the definition of "collation derivation is
> >     implicit" in section 4.2.2 of the standard and at first reading it
> >     wasn't obvious to me what it meant.
> >
> >     I know I suggested the 'collation type UNKNOWN' but I hadn't
> >     looked into
> >     the SQL standard in detail, and now I'm wondering if the UNKNOWN
> >     concept is a good idea. Since the SQL standard already defines a
> model
> >     for how collations are defined it might be wise to follow the
> required
> >     model and naming. Not sure what that would mean exactly, but it
> seems
> >     like each character expression can have a derivation of explicit,
> >     implicit or none. These may be better ways to carry state rather
> than
> >     unknown. Unless of course there's a clear mapping between unknown
> and
> >     the sql standard definition.
> >
> >     Dan.
> >
> >
>
>

------=_Part_204008_1135909.1174668813953
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

<div>Thanks a bunch for the pointers, Rick. </div>
<div>&nbsp;</div>
<div>What is intriguing is the default collation of the string data type. For a string literal, if it is used in a comparison operation with SYS schema character column, then logically, the string literal should have collation of UCS_BASIC. But if the string literal is used in comparison with user schema character column, then it&#39;s collation should be whatever is defined for that user schema. So, it sounds like the default collation of character set will be different depending on the schema. SQL spec does say that there is a collation descriptor associated with schema descriptor. And may be that is how character set&#39;s default collation will be determined. I need to spend more time on SQL spec to understand this completely.
</div>
<div>&nbsp;</div>
<div>Mamta<br><br>&nbsp;</div>
<div><span class="gmail_quote">On 3/23/07, <b class="gmail_sendername">Rick Hillegas</b> &lt;<a onclick="return top.js.OpenExtLink(window,event,this)" href="mailto:Richard.Hillegas@sun.com" target="_blank">Richard.Hillegas@sun.com
</a>&gt; wrote:</span> 
<blockquote class="gmail_quote" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">Hi Mamta,<br><br>This is my understanding of what these words mean, based on a quick<br>googling of industry practices. For instance, see 
<br><br><a onclick="return top.js.OpenExtLink(window,event,this)" href="http://www.nocomsoftware.se/p5745/files/whatsnew-sb-10.0.0.htm" target="_blank">http://www.nocomsoftware.se/p5745/files/whatsnew-sb-10.0.0.htm</a><br>
<a onclick="return top.js.OpenExtLink(window,event,this)" href="http://msdn2.microsoft.com/en-us/library/ms179886.aspx" target="_blank">http://msdn2.microsoft.com/en-us/library/ms179886.aspx </a><br><br>Explicit - This means that a COLLATE clause in the statement forces the
<br>server to use a particular collation.<br><br>Implicit - This means that your statement mentions a column without<br>using a COLLATE clause. The column itself has a collation which was <br>determined when the table was created.
<br><br>None - This case arises when you use SQL operators to combine two<br>columns which have different collations. For example &quot;select<br>frenchColumn || englishColumn from ...&quot;. In this case the server cannot 
<br>figure out which collation to use.<br><br>There is also a concept of a default collation for a datatype. As I read<br>the SQL standard, part 2, section 4.2.2, I see the following:<br><br>1) A string datatype has a default character set associated with it. 
<br><br>2) That character set, in turn, has a distinguished collation associated<br>with it.<br><br>3) That collation is the default collation of the string datatype. That<br>is, if you create a column of that datatype and you don&#39;t include a 
<br>COLLATE clause, then the column has that collation. Similarly, if you<br>declare a function that returns a string datatype and you don&#39;t include<br>a COLLATE clause, then the function returns a string having that default 
<br>collation.<br><br>Hope this helps...<br><br>Regards,<br>-Rick<br><br><br><br>Mamta Satoor wrote:<br>&gt; I am looking at the SQL spec to see how it deals with the problem of<br>&gt; different collation types, which they call as explicit, implicit and 
<br>&gt; none. Hopefully, that will make it easier to come up with a logic for<br>&gt; deducting correct collation type for non-trivial cases like COLLATE,<br>&gt; TRIM, string literal, etc.<br>&gt;<br>&gt; Mamta<br>&gt;<br>
&gt;<br>&gt; On 3/22/07, *Daniel John Debrunner* &lt;<a onclick="return top.js.OpenExtLink(window,event,this)" href="mailto:djd@apache.org" target="_blank">djd@apache.org</a><br>&gt; &lt;mailto:<a onclick="return top.js.OpenExtLink(window,event,this)" href="mailto:djd@apache.org" target="_blank">
djd@apache.org</a>&gt;&gt; wrote:<br>&gt;<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; Mamta Satoor wrote: <br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; Before talking about functions, I think it will be better to<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; first talk<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; about string literals and their collation determination.
<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt;<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; SQL spec section 5.3 &lt;literal&gt;, Syntax Rule 15) says &quot;The<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; declared type<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; collation of a &lt;character string literal&gt; is the character set<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; collation, and the collation derivation is implicit.&quot; 
<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt;<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; Based on this, when a string literal (collation type UNKNOWN) is<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; getting<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; used in a collation method with another operand as UCS_BASIC<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; collation,<br>
&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; then the collation type of string literal will be UCS_BASIC.<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; Similar<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; rule for operand with TERRITORY_BASED. In a case where,<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; collation types<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; of all the operands is UNKNOWN, at collation time, it can be 
<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; assumed to<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; be whatever is defined for user defined character columns. This<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; will be<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; similar to the example given by Rick for implicit collation type<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; when 
<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; talking about CAST ie<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; CREATE TABLE t1 (c11 char(1) default &#39;a&#39;) In this example, the<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; collation<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; type of DTD associated with &#39;a&#39; will be implicitly whatever is 
<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; defined<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; at the database level for COLLATION.<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt;<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; &gt; Hope this answers the question about string literals.<br>&gt;<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; Kind of, I looked up the definition of &quot;collation derivation is 
<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; implicit&quot; in section 4.2.2 of the standard and at first reading it<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; wasn&#39;t obvious to me what it meant.<br>&gt;<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; I know I suggested the &#39;collation type UNKNOWN&#39; but I hadn&#39;t 
<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; looked into<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; the SQL standard in detail, and now I&#39;m wondering if the UNKNOWN<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; concept is a good idea. Since the SQL standard already defines a model<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; for how collations are defined it might be wise to follow the required 
<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; model and naming. Not sure what that would mean exactly, but it seems<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; like each character expression can have a derivation of explicit,<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; implicit or none. These may be better ways to carry state rather than 
<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; unknown. Unless of course there&#39;s a clear mapping between unknown and<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; the sql standard definition.<br>&gt;<br>&gt;&nbsp;&nbsp;&nbsp;&nbsp; Dan.<br>&gt;<br>&gt;<br><br></blockquote></div><br>

------=_Part_204008_1135909.1174668813953--