db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Re: [jira] Commented: (DERBY-533) Re-enable national character datatypes
Date Wed, 24 Aug 2005 23:03:06 GMT
Hi Roy,

Thanks for your helpful analysis. We should probably pay closer 
attention to character sets and collations, particularly since MySQL has 
invested so much effort here.

Cheers,
-Rick

Roy Lyseng wrote:

> Hi Rick,
>
> I have only studied the SQL 1992 standard concerning character sets, 
> hope my understanding is still valid (if it ever was).
>
> Both the CHAR and the NCHAR data types are actually the same data type 
> CHAR (or CHARACTER), but made up of characters from different 
> character sets. Each database has in effect two default character 
> sets, the one used for CHAR and the one used for NCHAR. But you may 
> also specify an explicit character set for a column as in NAME 
> CHARACTER(100) CHARACTER SET UTF8. The character set used for CHAR can 
> also be overridden per schema.
>
> Thus, when you create a database, you should be able to specify that 
> the default character set for CHAR columns be ASCII, and the character 
> set used for NCHAR be UTF8.
>
> Note also that according to the SQL standard, values of type CHAR but 
> with different character sets are not generally comparable.
>
> Each character set will also have a default collation. In a database 
> with full SQL support for character sets and collations, you might use 
> this to say that both CHAR and NCHAR store UTF16 characters, but that 
> CHAR has a binary collation and NCHAR has a French collation.
>
> SQL will also allow you to override a collation specification e.g. on 
> an ORDER BY statement, and though not specified by the SQL standard, 
> you might be able to create an index using a national ordering sequence.
>
> Cheers,
> Roy
>
> Rick Hillegas (JIRA) wrote:
>
>>     [ 
>> http://issues.apache.org/jira/browse/DERBY-533?page=comments#action_12319919 
>> ]
>> Rick Hillegas commented on DERBY-533:
>> -------------------------------------
>>
>> 1) There are some interesting issues here. Let's say that we 
>> re-enable these datatypes in 10.2. What happens when a client 
>> application selects from an NCHAR column under the following 
>> combinations? What should the ResultSetMetaData say the column is? Is 
>> the following reasonable?
>>
>>
>> | NETWORK CLIENT | CLIENT PLATFORM | RESULT TYPE |
>> |-----------------------------|-----------------------------|----------------------|

>>
>> | Derby 10.2                 |  jdk1.4                        |   
>> NCHAR           |
>> |-----------------------------|-----------------------------|----------------------|

>>
>> | Derby 10.2                 |  jdk1.6                        |   
>> NCHAR           |
>> |-----------------------------|-----------------------------|----------------------|

>>
>> | Derby 10.1                 |  jdk1.4                        |   
>> CHAR              |
>> |-----------------------------|-----------------------------|----------------------|

>>
>> | Derby 10.1                 |  jdk1.6                        |   
>> CHAR              |
>> |-----------------------------|-----------------------------|----------------------|

>>
>> | db2jcc                        |  jdk1.4                        |   
>> CHAR              |
>> |-----------------------------|-----------------------------|----------------------|

>>
>> | db2jcc                        |  jdk1.6                        |   
>> CHAR              |
>> |-----------------------------|-----------------------------|----------------------|

>>
>>
>> Since all of our string datatypes are represented as unicode, I think 
>> it is ok, as necessary, to implicitly cast CHAR to NCHAR going from 
>> client to server.
>>
>> I also think it is reasonable to raise an exception if someone runs a 
>> 10.1 server against a 10.2 database.
>>
>> 2) I don't see where the SQL standard addresses coercion between 
>> national strings and other types. Part 2 section 4.2.1 says that 
>> NATIONAL CHARACTER is "implementation defined". Part 2 section 6.12 
>> lists legal and forbidden CASTS but says nothing about national 
>> string types. As always, I welcome being educated about what else 
>> might be relevant in the spec.
>>
>> Oracle supports the following coercions but not the inverse coercions 
>> and Oracle documentation does not address localization issues:
>>
>>    Datetime/Interval -> NCHAR/NVARCHAR2
>>    Number -> NCHAR/NVARCHAR2
>>
>> MySQL does not advertise any ability to cast to/from national strings.
>>
>> DB2 and Postgres do not support national strings.
>>
>> In summary, I do not see much guidance here. Derby's previous 
>> behavior seems reasonable to me: applying localization when coercing 
>> between national strings and other types.
>>
>>
>>
>>> Re-enable national character datatypes
>>> --------------------------------------
>>>
>>>         Key: DERBY-533
>>>         URL: http://issues.apache.org/jira/browse/DERBY-533
>>>     Project: Derby
>>>        Type: New Feature
>>>  Components: SQL
>>>    Versions: 10.1.1.0
>>>    Reporter: Rick Hillegas
>>
>>
>>
>>> SQL 2003 coyly defines national character types as "implementation 
>>> defined". Accordingly, there is considerable variability in how 
>>> these datatypes behave. Oracle and MySQL use these datatypes to 
>>> store unicode strings. This would not distinguish national from 
>>> non-national character types in Derby since Derby stores all strings 
>>> as unicode sequences.
>>> The national character datatypes (NCHAR, NVARCHAR, NCLOB and their 
>>> synonymns) used to exist in Cloudscape but were disabled in Derby. 
>>> The disabling comment in the grammar says "need to re-enable 
>>> according to SQL standard". Does this mean that the types were 
>>> removed because they chafed against SQL 2003? If so, what are their 
>>> defects?
>>> ------------------------------------------------------------------
>>> Cloudscape 3.5 provided the following support for national character 
>>> types:
>>> - NCHAR and NVARCHAR were legal datatypes.
>>> - Ordering operations on these datatypes was determined by the 
>>> collating sequence associated with the locale of the database.
>>> - The locale was a DATABASE-wide property which could not be altered.
>>> - Ordering on non-national character datatypes was lexicographic, 
>>> that is, character by character.
>>> ------------------------------------------------------------------
>>> Oracle 9i provides the following support for national character types:
>>> - NCHAR, NVARCHAR2, and NCLOB datatypes are used to store unicode 
>>> strings.
>>> - Sort order can be overridden per SESSION or even per QUERY, which 
>>> means that these overridden sort orders are not supported by indexes.
>>> ------------------------------------------------------------------
>>> DB2 does not appear to support national character types. Nor does 
>>> its DRDA data interchange protocol.
>>> ------------------------------------------------------------------
>>> MySQL provides the following support for national character types:
>>> - National Char and National Varchar datatypes are used to hold 
>>> unicode strings. I cannot find a national CLOB type.
>>> - The character set and sort order can be changed at SERVER-wide, 
>>> TABLE-wide, or COLUMN-specific levels.
>>> ------------------------------------------------------------------
>>> If we removed the disabling logic in Derby, I believe that the 
>>> following would happen:
>>> - We would get NCHAR, NVARCHAR, and NCLOB datatypes.
>>> - These would sort according to the locale that was bound to the 
>>> database when it was created.
>>> - We would have to build DRDA transport support for these types.
>>> The difference between national and non-national datatypes would be 
>>> their sort order.
>>> I am keenly interested in understanding what defects (other than 
>>> DRDA support) should be addressed in the disabled implementation.
>>
>>
>>
>
>


Mime
View raw message