Return-Path: Delivered-To: apmail-db-derby-commits-archive@www.apache.org Received: (qmail 70807 invoked from network); 30 Jun 2006 15:38:55 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 30 Jun 2006 15:38:55 -0000 Received: (qmail 16736 invoked by uid 500); 30 Jun 2006 15:38:55 -0000 Delivered-To: apmail-db-derby-commits-archive@db.apache.org Received: (qmail 16706 invoked by uid 500); 30 Jun 2006 15:38:54 -0000 Mailing-List: contact derby-commits-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: Reply-To: "Derby Development" List-Id: Delivered-To: mailing list derby-commits@db.apache.org Received: (qmail 16695 invoked by uid 99); 30 Jun 2006 15:38:54 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Jun 2006 08:38:54 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS X-Spam-Check-By: apache.org Received: from [192.87.106.226] (HELO ajax.apache.org) (192.87.106.226) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 30 Jun 2006 08:38:54 -0700 Received: from ajax.apache.org (localhost [127.0.0.1]) by ajax.apache.org (Postfix) with ESMTP id B346D6ACA9 for ; Fri, 30 Jun 2006 16:38:32 +0100 (BST) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: Apache Wiki To: derby-commits@db.apache.org Date: Fri, 30 Jun 2006 15:38:32 -0000 Message-ID: <20060630153832.27231.60857@ajax.apache.org> Subject: [Db-derby Wiki] Update of "LanguageBasedOrdering" by KatheyMarsden X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Dear Wiki user, You have subscribed to a wiki page or wiki category on "Db-derby Wiki" for change notification. The following page has been changed by KatheyMarsden: http://wiki.apache.org/db-derby/LanguageBasedOrdering New page: This page describes how to implement a function to perform language based ordering. It is largely copied from the email suggestion by Daniel Debrunner [http://www.nabble.com/Language-based-ordering-is-possible-in-Derby-tf482505.html#a1313448 Language based ordering is possible in Derby] but incorporates suggestions from Oyvind to add language and country parameters and adds a function to sort by the JVM default locale. == Why do we need a Language based ordering function? == By default Derby performs Unicode character comparison for ordering. This is not suitable for many languages. A Derby function can be used to specify the correct Language for ordering. == The Start of a Solution == The example is not a complete solution but a place to start. Please fix up the example here on the Wiki as you make it better. The trick is to write an user defined function that takes a string and returns a value that is correct for ordering on. Say we have a function LOCALE_ORDER that takes language, country and value, then you can write: {{{ SELECT ID, NAME FROM CUSTOMER ORDER BY LOCALE_ORDER('pl','PL',name) }}} Java provides all the functionality required, so it's really just a few lines of Java code. The code uses java.text.Collator to obtain a byte array that represents the collation key of a string and then Derby sorts on that byte array, as a VARCHAR FOR BIT DATA. It needs some more investigation to ensure Derby orders the returned byte arrays correctly, particularly if one byte array is larger than the other, when up to that point they have the same contents. Also more checking is needed to see if the Derby ordering matches the expected language collation. Another issue is the length of the returned byte array. We would need to see how long it can be and what would happen it it returned a value longer than the declared type of the SQL function. An alternative would be to change the Java method to return a String representation of the byte array and sort on that. One general issue with this approach is that indexes won't be used. [http://issues.apache.org/jira/browse/DERBY-455 DERBY-455] would help here. == Example SQL Functions == There are two example functions for lanuguage based ordering. "LOCALE_ORDER" takes a language, a country and a value parameter and returns a collation key. The locale can be specified in the query. Syntax: LOCALE_ORDER(LanguageCode, CountryCode, value) LanguageCode - lowercase two-letter ISO-639 code CountryCode - uppercase two-letter ISO-3166 code value - value to order on, typically a column name Example: {{{ SELECT ID, NAME FROM CUSTOMER ORDER BY LOCALE_ORDER('pl','PL',name) }}} JVM_DEFAULT_LOCALE_ORDER takes only a value and will sort based on the JVM default locale. Syntax: JVM_DEFAULT_LOCALE_ORDER(value) value - value to order on, typically a column name Example: {{{ SELECT ID, NAME FROM CUSTOMER ORDER BY JVM_DEFAULT_LOCALE_ORDER(name) }}} To create the functions run the script below in ij {{{ CONNECT 'jdbc:derby:testdb;create=true'; CREATE FUNCTION LOCALE_ORDER( LANGUAGE_CODE VARCHAR(2), COUNTRY_CODE VARCHAR(2), VALUE VARCHAR(128)) RETURNS VARCHAR(255) FOR BIT DATA LANGUAGE JAVA PARAMETER STYLE JAVA NO SQL RETURNS NULL ON NULL INPUT EXTERNAL NAME 'derbytest.LocaleOrder.localeOrder'; CREATE FUNCTION JVM_DEFAULT_LOCALE_ORDER( VALUE VARCHAR(128)) RETURNS VARCHAR(255) FOR BIT DATA LANGUAGE JAVA PARAMETER STYLE JAVA NO SQL RETURNS NULL ON NULL INPUT EXTERNAL NAME 'derbytest.LocaleOrder.defaultLocaleOrder'; }}} == Java code for function implementation == {{{ package derbytest; import java.text.CollationKey; import java.text.Collator; import java.util.Locale; public class LocaleOrder { /** * Function that can be used by Derby for language based ordering. * Uses java.text.Collator to obtain a byte array that represents * the collation key of a string and then Derby sorts on that byte array. * * RESOLVE: These Comments from Dan who submitted the original code/idea * to the Derby list * - What will happen if one byte array is larger than the other, * when up to that point they have the same * - Test that Derby ordering matches the expected language collation. * - An alternative is to return a String representation of the byte * array and sort on that. * * @param language lowercase two-letter ISO-639 code. * @param country uppercase two-letter ISO-3166 code. * @param value Value for which collationKey is needed for ordering * @return collationKey for value * */ public static byte[] localeOrder(String language, String country, String value) { Locale locale = new Locale(language, country); Collator collator = Collator.getInstance(locale); CollationKey key = collator.getCollationKey(value); byte[] rawData = key.toByteArray(); return rawData; } /** * Function used for order by the JVM default Locale * * @param value Value for which collation key is needed for ordering * @return collation key for value based on default Locale * @see #localeOrder(String language, String country, String value) */ public static byte[] defaultLocaleOrder(String value) { Locale locale = Locale.getDefault(); return localeOrder(locale.getLanguage(), locale.getCountry(), value); } } }}} == Test Code == {{{ package derbytest; import java.sql.*; import java.util.Locale; import org.apache.derby.jdbc.EmbeddedDriver; public class Test { private static final String[] NAMES = { // Just Smith, Zebra, Acorn with alternate A,S and Z "Smith", "Zebra", "\u0104corn", "\u017Bebra", "Acorn", "\u015Amith", "aacorn", }; public static void main(String[] args) throws SQLException { new EmbeddedDriver(); Connection conn = DriverManager.getConnection("jdbc:derby:testdb"); Statement s = conn.createStatement(); try { s.execute("CREATE TABLE CUSTOMER(ID INT, NAME VARCHAR(40))"); conn.setAutoCommit(false); PreparedStatement ps = conn.prepareStatement("INSERT INTO CUSTOMER VALUES(?,?)"); for (int i = 0; i < NAMES.length; i++) { ps.setInt(1, i); ps.setString(2, NAMES[i]); ps.executeUpdate(); } conn.commit(); ps.close(); } catch (SQLException sqle) { } System.out.println("DEFAULT Unicode character set ordering"); query(s, "SELECT ID, NAME FROM CUSTOMER ORDER BY NAME"); System.out.println("Polish ordering"); query(s, "SELECT ID, NAME FROM CUSTOMER ORDER BY LOCALE_ORDER('pl','PL',NAME)"); System.out.println("JVM Default Locale ordering for wherever you are:" + Locale.getDefault()); query(s, "SELECT ID, NAME FROM CUSTOMER ORDER BY JVM_DEFAULT_LOCALE_ORDER(NAME)"); System.out.println("Norwegian ordering by forcing the Norwegian default locale"); Locale.setDefault(new Locale("no","NO")); query(s, "SELECT ID, NAME FROM CUSTOMER ORDER BY JVM_DEFAULT_LOCALE_ORDER(NAME)"); } private static void query(Statement s, String sql) throws SQLException { System.out.println(sql); ResultSet rs = s.executeQuery(sql); while (rs.next()) { System.out.print(" "); System.out.print(rs.getString(1)); System.out.print(" , "); System.out.println(rs.getString(2)); } rs.close(); } } }}} == Test.java Output == The test output shows the various language based orderings. For example aacorn sorts to the bottom for Norwegian. {{{ $ java derbytest.Test DEFAULT Unicode character set ordering SELECT ID, NAME FROM CUSTOMER ORDER BY NAME 4 , Acorn 0 , Smith 1 , Zebra 6 , aacorn 2 , ?corn 5 , ?mith 3 , ?ebra Polish ordering SELECT ID, NAME FROM CUSTOMER ORDER BY LOCALE_ORDER('pl','PL',NAME) 6 , aacorn 4 , Acorn 2 , ?corn 0 , Smith 5 , ?mith 1 , Zebra 3 , ?ebra JVM Default Locale ordering for wherever you are:en_US SELECT ID, NAME FROM CUSTOMER ORDER BY JVM_DEFAULT_LOCALE_ORDER(NAME) 6 , aacorn 4 , Acorn 2 , ?corn 0 , Smith 5 , ?mith 1 , Zebra 3 , ?ebra Norwegian ordering by forcing the Norwegian default locale SELECT ID, NAME FROM CUSTOMER ORDER BY JVM_DEFAULT_LOCALE_ORDER(NAME) 4 , Acorn 2 , ?corn 0 , Smith 5 , ?mith 1 , Zebra 3 , ?ebra 6 , aacorn }}} == Related Pages == ["DerbySQLroutines"]