db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta Satoor" <msat...@gmail.com>
Subject Re: Collation feature discussion
Date Mon, 26 Mar 2007 21:42:11 GMT
Rick, thanks for your time one more time. It's always great to have as many
eyes as possible on SQL-spec interpretation because one can end up feeling
very tangled up in that spec.

In this mail, I want to answer following question
Other than this, I think Derby's collation type of UNKNOWN has a clear
> mapping with "none" value of collation derivation in SQL spec.
I'm a little lost here. Could you explain what is meant by the UNKNOWN
collation? I don't see this collation described in either the SQL
standard or in the spec attached to DERBY-1478.

In the beginning stages of the 2nd proposal for Language based ordering, we
were talking of collation type of character string types in terms of
UCS_BASIC/TERRITORY_BASED/UNKNOWN. UNKNOWN was to take care of character
string types whose collation can't be determined. Because Derby 10.3 isn't
planning to support COLLATE clause, it was enough for Derby 10.3 to talk
about collation type in terms of specific collation type
(UCS_BASIC/TERRITORY_BASED) or as unknown collation type (UNKNOWN). In other
words, a collation type of UCS_BASIC/TERRITORY_BASED in Derby 10.3 would
have translated to SQL spec (Section 4.2.2 Comparison of character strings,
3rd paragraph) terminology as declared type
collation=UCS_BASIC/TERRITORY_BASED and collation derivation=implicit. The
value UNKNOWN in Derby 10.3 would have translated to collation
derivation=none and basically ignore declared type collation.

But I think it will be best for my proposal to talk in terms of SQL spec
terminology rather than introducing in-house one. So, what I am proposing is
character type in Derby will have 2 additional attributes associated with
them, namely collation type and collation derivation. As per SQL spec,
collation derivation can have 3 possible values (explicit, implicit, none).
Derby 10.3 in the absence of SQL's COLLATE clause  will have only 2 possible
values for collation derivation - namely implicit and none. If collation
derivation is implicit, then collation type will be
UCS_BASIC/TERRITORY_BASED. If collation derivatiation is none, then
collation type should be ignored or N/A because we can't determine the
collation of such a character string type.

I hope this clears up the confusion about UNKNOWN, which won't be used
anymore. Instead, we will use 'none' or 'implicit' in Derby 10.3

On 3/26/07, Rick Hillegas <Richard.Hillegas@sun.com> wrote:
> Hi Mamta,
> Thanks for this extensive write-up. This helps me puzzle through the
> issues although I'm afraid I'm still muddled. Some comments follow inline.
> Mamta Satoor wrote:
> > lots of good stuff ...
> >
> > SQL spec also talks in various sections about default collation
> > associated with character repertoire, character set, SQL schema, SQL
> > session. These defaults are used to determine the collation
> > requirement for character string types in ambiguous places. (In a SQL
> > implementation of SQL spec's collation clause support, if the rules
> > for the collation determination is not what the application wants,
> > then the user can use the COLLATE clause to override the SQL spec
> > behavior.) For string literals, which has been the focus of recent
> > collation discussion on the Derby list, SQL spec specifies(Section 5.3
> > <literal> Syntax Rule 15) that it's collation will be the collation of
> > the character set (which is the default collation of the character
> > set. Every character set defined in the SQL implementation is required
> > to have a default collation).
> >
> > In the absence of the COLLATE clause support in Derby 10.3, I do not
> > think we can follow the SQL spec for string literals' collation type.
> > If we decide to make user defined collation(through the JDBC url
> > attribute COLLATION) as Derby's default collation for the character
> > set, then that would mean that a 10.3 db with COLLATION attribute as
> > TERRITORY_BASED will always have string literals with TERRITORY_BASED
> > collation. This will break our metadata queries which does comparison
> > of SYS character column against string literals. The SYS character
> > column will have UCS_BASIC collation and string literals will have
> > TERRITORY_BASED collation and during the comparison, Derby will end up
> > throwing exception because character strings with different collation
> > can't be compared. If Derby 10.3 had support for COLLATE clause, then
> > we could implement SQL spec behavior for string literal and let
> > metadata queries use the COLLATE clause and users could use the
> > COLLATE clause in their queries against system tables when using
> > string literals.
> I wonder if we could just implement the COLLATE clause for string
> literals. Would that be a lot of work? The metadata queries could be
> adjusted to use the COLLATE clause. I think we should avoid violating
> the SQL standard if we can.
> >
> > So, in the absence of the COLLATE clause in Derby 10.3, what I am
> > proposing is string literals have a collation type of UNKNOWN. When
> > they get used in a collation operation, these UNKNOWN collation types
> > will get their collation from the other operands involved in the
> > operation (requirement here would be that all the operands whose
> > collation type is not UNKNOWN, will have the same collation type
> > associated with them. This requirement comes from Section 9.13
> > Collation determination, Syntax Rule 3e). If all of the operands in
> > collation operation have collation type as UNKNOWN, then the collation
> > chosen would be the one specified through the COLLATION attribute on
> > the JDBC url. This last line makes Derby's collation behavior not
> > match SQL spec Section 9.13 Collation determination, Rule 2 which says
> > "At least one operand shall have a declared type collation." But
> > again, I think in the absence of SQL's COLLATE clause, we have to
> > break some rules to make Derby more flexible. If we go with my
> > proposal for string literal, then a comparison like 'aa' < 'ba' will
> > have the 2 operands with UNKNOWN collation, and Derby's collation
> > algorithim at collation time will choose the value specified for
> > COLLATION attribute in the JDBC url. For function that returns a
> > string datatype, we will have the collation type for that string
> > datatype as UNKNOWN (same as string literals).
> >
> > Other than this, I think Derby's collation type of UNKNOWN has a clear
> > mapping with "none" value of collation derivation in SQL spec.
> I'm a little lost here. Could you explain what is meant by the UNKNOWN
> collation? I don't see this collation described in either the SQL
> standard or in the spec attached to DERBY-1478.
> >
> > more good stuff ...
> > Syntax Rule 4 says that their collation type is the collation
> > of character set SQL_IDENTIFIER. SQL spec Section 4.2.6 Collations
> > talks about SQL_IDENTIFIER's collation being implementation defined.
> > Based on this, we can decide the collation for these USER functions
> > to be UNKNOWN. Again, the argument here is same as used for string
> > literals.
> It seems to me that SQL_IDENTIFIERs have collation UCS_BASIC. This is
> how they behave when it comes to preventing two schema objects from
> having the same name. And this is what forces a collation of UCS_BASIC
> on the string columns in the system tables. I'm not sure I understand
> bullet (7). I hope it is not saying that SQL_IDENTIFIERs have UNKNOWN
> collation. I think we will get into a lot of trouble if we try to force
> SQL_IDENTIFIERs to have some collation other than UCS_BASIC.
> Thanks,
> -Rick

View raw message