Return-Path: Delivered-To: apmail-db-derby-user-archive@www.apache.org Received: (qmail 37729 invoked from network); 2 Nov 2005 23:28:31 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 2 Nov 2005 23:28:31 -0000 Received: (qmail 2841 invoked by uid 500); 2 Nov 2005 23:28:30 -0000 Delivered-To: apmail-db-derby-user-archive@db.apache.org Received: (qmail 2820 invoked by uid 500); 2 Nov 2005 23:28:29 -0000 Mailing-List: contact derby-user-help@db.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Reply-To: "Derby Discussion" Delivered-To: mailing list derby-user@db.apache.org Received: (qmail 2809 invoked by uid 99); 2 Nov 2005 23:28:29 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2005 15:28:29 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [32.97.182.144] (HELO e4.ny.us.ibm.com) (32.97.182.144) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Nov 2005 15:28:24 -0800 Received: from d01relay04.pok.ibm.com (d01relay04.pok.ibm.com [9.56.227.236]) by e4.ny.us.ibm.com (8.12.11/8.12.11) with ESMTP id jA2NS5lS018788 for ; Wed, 2 Nov 2005 18:28:05 -0500 Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by d01relay04.pok.ibm.com (8.12.10/NCO/VERS6.7) with ESMTP id jA2NS53X090146 for ; Wed, 2 Nov 2005 18:28:05 -0500 Received: from d01av02.pok.ibm.com (loopback [127.0.0.1]) by d01av02.pok.ibm.com (8.12.11/8.13.3) with ESMTP id jA2NS5W3012278 for ; Wed, 2 Nov 2005 18:28:05 -0500 Received: from [127.0.0.1] (DMCSDJDT41P.usca.ibm.com [9.72.133.129]) by d01av02.pok.ibm.com (8.12.11/8.12.11) with ESMTP id jA2NS3kn012154 for ; Wed, 2 Nov 2005 18:28:04 -0500 Message-ID: <43694B82.3000809@debrunners.com> Date: Wed, 02 Nov 2005 15:28:02 -0800 From: Daniel John Debrunner User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.3) Gecko/20040910 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Derby Discussion Subject: Language based ordering is possible in Derby X-Enigmail-Version: 0.90.0.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N A post on the Cloudscape forum asked how to do ordering based upon the Polish alphabet. I think there have been similar posts here. http://www-128.ibm.com/developerworks/forums/dw_thread.jsp?forum=370&thread=96898&message=13759450&cat=19 I realised today there is a way with Derby, thanks to the new functionality in 10.1, order by expressions (Thanks, Tomohito!). I experimented with my idea and it basically works, though probably could do with some refinements. The trick is to write an user defined function that takes a string and returns a value that is correct for ordering on. Say we have a specific function POLISH_ORDER, then you can write. SELECT ID, NAME FROM CUSTOMER ORDER BY POLISH_ORDER(name) Now, Java provides all the functionality required, so it's really just a few lines of Java code. The code uses java.text.Collator to obtain a byte array that represents to collation key of a string and then Derby sorts on that byte array, as a VARCHAR FOR BIT DATA. It needs some more investigation to ensure Derby orders the returned byte arrays correctly. I'm not sure what will happen if one byte array is larger than the other, when up to that point they have the same contents. Would need to check to see if the Derby ordering matches the expected language collation. Another issue is the length of the returned byte array, would need to see how long it can be and what would happen it it returned a value longer than the declared type of the SQL function. An alternative would be to change the Java method getBinaryCollationKey to return a String representation of the byte array and sort on that. Thought this might be a useful starting point for folks. Here's the output of the test program below. DEFAULT Unicode character set ordering SELECT ID, NAME FROM CUSTOMER ORDER BY NAME 4 , Acorn 0 , Smith 1 , Zebra 2 , ?corn 5 , ?mith 3 , ?ebra Polish ordering SELECT ID, NAME FROM CUSTOMER ORDER BY POLISH_ORDER(NAME) 4 , Acorn 2 , ?corn 0 , Smith 5 , ?mith 1 , Zebra 3 , ?ebra Dan. ------------------------------------------------------------------------ SQL Functions CREATE FUNCTION POLISH_ORDER( VALUE VARCHAR(128)) RETURNS VARCHAR(255) FOR BIT DATA LANGUAGE JAVA PARAMETER STYLE JAVA NO SQL RETURNS NULL ON NULL INPUT EXTERNAL NAME 'derbytest.LocaleOrder.polish'; CREATE FUNCTION FRENCH_ORDER( VALUE VARCHAR(128)) RETURNS VARCHAR(255) FOR BIT DATA LANGUAGE JAVA PARAMETER STYLE JAVA NO SQL RETURNS NULL ON NULL INPUT EXTERNAL NAME 'derbytest.LocaleOrder.french'; ------------------------------------------------------------------------- Java locale ordering code package derbytest; import java.text.CollationKey; import java.text.Collator; import java.util.Locale; public class LocaleOrder { private static Locale POLISH = new Locale("pl", "PL"); public static byte[] polish(String value) { return getBinaryCollationKey(value, POLISH); } public static byte[] french(String value) { return getBinaryCollationKey(value, Locale.FRENCH); } private static byte[] getBinaryCollationKey(String value, Locale locale) { Collator collator = Collator.getInstance(locale); CollationKey key = collator.getCollationKey(value); byte[] rawData = key.toByteArray(); return rawData; } } ------------------------------------------------------------------------- Java test code package derbytest; import java.sql.*; import org.apache.derby.jdbc.EmbeddedDriver; public class Test { private static final String[] NAMES = { // Just Smith, Zebra, Acorn with alternate A,S and Z "Smith", "Zebra", "\u0104corn", "\u017Bebra", "Acorn", "\u015Amith" }; public static void main(String[] args) throws SQLException { new EmbeddedDriver(); Connection conn = DriverManager.getConnection("jdbc:derby:cs"); Statement s = conn.createStatement(); try { s.execute("CREATE TABLE CUSTOMER(ID INT, NAME VARCHAR(40))"); conn.setAutoCommit(false); PreparedStatement ps = conn.prepareStatement("INSERT INTO CUSTOMER VALUES(?,?)"); for (int i = 0; i < NAMES.length; i++) { ps.setInt(1, i); ps.setString(2, NAMES[i]); ps.executeUpdate(); } conn.commit(); ps.close(); } catch (SQLException sqle) { } System.out.println("DEFAULT Unicode character set ordering"); query(s, "SELECT ID, NAME FROM CUSTOMER ORDER BY NAME"); System.out.println("Polish ordering"); query(s, "SELECT ID, NAME FROM CUSTOMER ORDER BY POLISH_ORDER(NAME)"); } private static void query(Statement s, String sql) throws SQLException { System.out.println(sql); ResultSet rs = s.executeQuery(sql); while (rs.next()) { System.out.print(" "); System.out.print(rs.getString(1)); System.out.print(" , "); System.out.println(rs.getString(2)); } rs.close(); } } --------------------------------------------------------------------------