db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel John Debrunner <...@debrunners.com>
Subject Language based ordering is possible in Derby
Date Wed, 02 Nov 2005 23:28:02 GMT
A post on the Cloudscape forum asked how to do ordering based upon the
Polish alphabet. I think there have been similar posts here.

http://www-128.ibm.com/developerworks/forums/dw_thread.jsp?forum=370&thread=96898&message=13759450&cat=19

I realised today there is a way with Derby, thanks to the new
functionality in 10.1, order by expressions (Thanks, Tomohito!).

I experimented with my idea and it basically works, though probably
could do with some refinements.

The trick is to write an user defined function that takes a string and
returns a value that is correct for ordering on. Say we have a specific
function POLISH_ORDER, then you can write.

SELECT ID, NAME FROM CUSTOMER ORDER BY POLISH_ORDER(name)

Now, Java provides all the functionality required, so it's really just a
few lines of Java code.

The code uses java.text.Collator to obtain a byte array that represents
to collation key of a string and then Derby sorts on that byte array, as
a VARCHAR FOR BIT DATA.

It needs some more investigation to ensure Derby orders the returned
byte arrays correctly. I'm not sure what will happen if one byte array
is larger than the other, when up to that point they have the same
contents. Would need to check to see if the Derby ordering matches the
expected language collation.

Another issue is the length of the returned byte array, would need to
see how long it can be and what would happen it it returned a value
longer than the declared type of the SQL function.

An alternative would be to change the Java method getBinaryCollationKey
to return a String representation of the byte array and sort on that.

Thought this might be a useful starting point for folks.

Here's the output of the test program below.

DEFAULT Unicode character set ordering
SELECT ID, NAME FROM CUSTOMER ORDER BY NAME
  4 , Acorn
  0 , Smith
  1 , Zebra
  2 , ?corn
  5 , ?mith
  3 , ?ebra
Polish ordering
SELECT ID, NAME FROM CUSTOMER ORDER BY POLISH_ORDER(NAME)
  4 , Acorn
  2 , ?corn
  0 , Smith
  5 , ?mith
  1 , Zebra
  3 , ?ebra

Dan.

------------------------------------------------------------------------
SQL Functions

CREATE FUNCTION POLISH_ORDER(
VALUE VARCHAR(128))
RETURNS VARCHAR(255) FOR BIT DATA
LANGUAGE JAVA
PARAMETER STYLE JAVA
NO SQL
RETURNS NULL ON NULL INPUT
EXTERNAL NAME 'derbytest.LocaleOrder.polish';

CREATE FUNCTION FRENCH_ORDER(
VALUE VARCHAR(128))
RETURNS VARCHAR(255) FOR BIT DATA
LANGUAGE JAVA
PARAMETER STYLE JAVA
NO SQL
RETURNS NULL ON NULL INPUT
EXTERNAL NAME 'derbytest.LocaleOrder.french';

-------------------------------------------------------------------------
Java locale ordering code

package derbytest;

import java.text.CollationKey;
import java.text.Collator;
import java.util.Locale;

public class LocaleOrder {
	
	private static Locale POLISH = new Locale("pl", "PL");
	
	public static byte[] polish(String value)
	{
		return getBinaryCollationKey(value, POLISH);
	}	
	
	public static byte[] french(String value)
	{
		return getBinaryCollationKey(value, Locale.FRENCH);
	}
	
	private static byte[] getBinaryCollationKey(String value, Locale locale)
	{		
		Collator collator = Collator.getInstance(locale);
		
		CollationKey key = collator.getCollationKey(value);
		
		byte[] rawData = key.toByteArray();
		
		return rawData;
	}
}
-------------------------------------------------------------------------
Java test code

package derbytest;

import java.sql.*;

import org.apache.derby.jdbc.EmbeddedDriver;

public class Test {
	
	private static final String[] NAMES =
	{
		// Just Smith, Zebra, Acorn with alternate A,S and Z
		"Smith",
		"Zebra",
		"\u0104corn",
		"\u017Bebra",
		"Acorn",
		"\u015Amith"	
	};
	
	public static void main(String[] args) throws SQLException
	{
		new EmbeddedDriver();
		Connection conn = DriverManager.getConnection("jdbc:derby:cs");
		
		Statement s = conn.createStatement();
		
		try {
		s.execute("CREATE TABLE CUSTOMER(ID INT, NAME VARCHAR(40))");
		
		conn.setAutoCommit(false);
		PreparedStatement ps = conn.prepareStatement("INSERT INTO CUSTOMER
VALUES(?,?)");
		
		for (int i = 0; i < NAMES.length; i++)
		{
			ps.setInt(1, i);
			ps.setString(2, NAMES[i]);
			ps.executeUpdate();
		}
		conn.commit();
		ps.close();
		} catch (SQLException sqle)
		{
			
		}
		
		System.out.println("DEFAULT Unicode character set ordering");
		query(s, "SELECT ID, NAME FROM CUSTOMER ORDER BY NAME");

		System.out.println("Polish ordering");
		query(s, "SELECT ID, NAME FROM CUSTOMER ORDER BY POLISH_ORDER(NAME)");
}
	
	private static void query(Statement s, String sql) throws SQLException
	{
		System.out.println(sql);
		ResultSet rs = s.executeQuery(sql);
		while (rs.next())
		{
			System.out.print("  ");
			System.out.print(rs.getString(1));
			System.out.print(" , ");
			System.out.println(rs.getString(2));
		}
		rs.close();
	}
}


--------------------------------------------------------------------------


Mime
View raw message