db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta Satoor" <msat...@gmail.com>
Subject Re: Another collation question - Derby-1478 and Derby-2377
Date Wed, 16 May 2007 21:11:37 GMT
Wow, Mike has done such a great job of covering the questions that I don't
have much to add. Just answer to one of Laura's question
> Is there a complete listing of the territories that are supported...
> maybe in a Java spec?
As Mike says, this feature (DERBY-1478) does not change the existing support
for territories in any ways. Derby 10.2 reference manual under "Setting
attributes for the database connection URL" has a sub-section called
"territory=ll_CC" and it talks about ll and CC and where the valid values
for them can be found.

Laura, thanks for working on the documentation for DERBY-1478. Let us know
if you have any further questions.


On 5/16/07, Mike Matrigali <mikem_app@sbcglobal.net> wrote:
> Laura Stewart wrote:
> > As part of adding the new attribute collation=TERRITORY_BASED, I think
> > that we need to describe how Derby handles collation.
> >
> > I am trying to get my head around the best way to describe collation
> > in Derby... for 10.3
> >
> > In general terms, a collating sequence is a defined ordering for
> > character data that determines whether a particular character sorts
> > higher, lower, or the same as another character.  Each character set
> > will also have a default collation.
> I would also not use character set.  I would approach documenting it
> based on the behavior of datatypes rather than talk about character
> sets.  So CHAR, VARCHAR, LONG VARCHAR and CLOB comparison/ordering/like
> processing is affected.
> >
> > In Derby, it is my understanding that all of our string data types are
> > represented as Unicode sequences.  Is that correct?
> I believe the documentation should only speak to the datatypes rather
> than the underlying storage structure.  To understand current
> implementation all operations on character types use either String or
> java char in memory to perform operations.  JDBC defines how one inputs
> data into the datatypes and retrieves data from the datatypes.
> >
> > We should have a complete list of the data types that are impacted by
> > collation.
> > CHAR
> > CLOB ?
> I believe it is
> >
> > Does Derby support the national character datatypes such as
> No.
> >
> > FYI - there is a feeling among some in the Internet community that the
> > term "character set" is not appropriate.  They tout character code,
> > character encoding, or character repertoire.
> >
> > Does Derby support specifying codes?  Is that what the attribute
> > territory=l_CCI (example territory=es_MX) does?
> >
> > Is there a complete listing of the territories that are supported...
> > maybe in a Java spec?
> Hopefully mamta can expand here.  I hope that we can define our support
> in terms of the standard interfaces we are using from java to perform
> the ordering if a database has been defined to order based on it's
> territory.
> I don't believe 10.3 will change the territories supported, it is the
> same set as 10.2 (basically we support what java supports).  10.3 just
> allows collation to be based on territory, all other territory support
> is unchanged.
> >
> > When you create a database, can you specify that the
> > default character set for CHAR columns be ASCII, and the character set
> > used for NCHAR be UTF8?
> No there is no such thing.  We are not specifying a character set.  You
> specify a teritory, this is existing functionality in 10.2.  In 10.3 you
> specify at database creation time if you want collation of all user
> character data to be determined by the territory or not.  In the current
> implementation it does not change the storage format, but I don't think
> that should be part of the documentation.
> Do not get confused by what other databases may have to include in such
> a change.  Derby has always used java String/char support which is
> unicode based, so no difference is needed to operate on non-ascii
> character data.  How Derby chooses to read/write those characters to
> disk is even less important for user interface documentation and could
> be changed in the future.  We happen to currently use a modified UTF8
> scheme (modified to support very long strings), but that is never
> exposed to a user.
> >
> > The Derby documentation mentions code sets, but only with relationship
> > to import/export topics or ij sessions...
> right.  The 10.3 functionality does not change any of this, it only
> affects the ordering within the server.  Different operating systems,
> environments may operate on different codesets outside of derby - but
> once the data has gotten in (through an import, ij, jdbc) then data
> is treated same on all systems.  On exit (export, ij, jdbc) the data
> may then get transformed to a native codeset.  None of this is affected
> by the 10.3 collation changes.
> >
> > Any insite that you can provide on this would be appreciated.
> >

View raw message