db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <Richard.Hille...@Sun.COM>
Subject Re: Language based matching
Date Wed, 12 Jul 2006 16:29:05 GMT
Hi Kathey,

My gut feeling is that you are headed off into the tall weeds here. That 
said, let me walk with you part way into the swamp. Another feature you 
might want would be DERBY-481, computed columns. This would help you get 
better performance. So, for instance, you could declare your data like this:

create table foo
(
    name  varchar(20)
    nameKey varchar(60) for bit data generated always as ( locale_order( 
'pl', 'PL', name )
);
create index foo_idx on foo( nameKey );

Lacking DERBY-481, you might get away with

create table foo
(
    name  varchar(20)
    nameKey varchar(60) for bit data
);
create index foo_idx on foo( nameKey );

and then use triggers or procedures to populate the nameKey column based 
on the value in name.

Then you could get decent performance if you did ORDER BY and GROUP BY 
on foo.nameKey. Other operations might look like this:

select * from foo
where nameKey in ( locale_order( 'pl', 'PL', 'dsfaf' ), ... );

select * from foo
where nameKey between locale_order( 'pl', 'PL', 'lkjh' ) and 
locale_order( 'pl', 'PL', 'mnbv' );

select * from foo
where nameKey < locale_order( 'pl', 'PL', 'asdfgf'' )

LIKE is going to be a pile of work. I think your LOCALE_MATCHES function 
will have to duplicate a lot of the code in Derby. At the end of the 
day, you will replace LIKE with LOCALE_MATCHES and so lose the 
performance-enhancing query pre-processing which DERBY does for %. Here 
the weeds have become too thick for me.



Kathey Marsden wrote:

> Rick Hillegas wrote:
>
>>
>> 3) The locale-sensitive meaning of <, =, and > affected the operation 
>> of all orderings of national strings, including sorts, indexes, 
>> unions, group-by's, like's, between's, and in's.
>>
>> At one point I was keen on re-enabling the national string types. Now 
>> I am leaning toward implementing the ANSI collation language. I think 
>> this is more powerful. In particular, it lets you support more than 
>> one language-sensitive ordering in the same database.
>>
>> You and your customer face a hard problem trying to migrate national 
>> strings from Cloudscape 5.1.60 into Derby 10.1.3 or 10.2. I'm at a 
>> loss how to do this in a way that preserves Cloudscape's performance.
>>
>
> Thank you so much Rick for helping me understand this stuff. For now 
> lets just assume this is just a small dataset and  set  performance 
> aside I am interested to know
> 1)   When might Locale specific matching  be different  in the 
> context  WHERE value LIKE '%<   >%'  (or whatever language we use)  
> besides the deprecated Norwegian 'aa' and when might this be useful?  
> Is it somehow related to bidirectional data like Hebrew and Arabic?

I'm afraid I don't understand the question. I think you are going to 
have to duplicate the LIKE processing code, splicing special characters 
into subkeys created by LOCALE_ORDER. I don't understand the issues with 
Semitic languages but I suspect that Arabic orthography creates some 
interesting cases.

>  2)  Is there some easy java code that can be used to accomplish 
> writing a LOCALE_MATCHES(pattern,value) function?

I'm afraid I can't point you at anything easier than Derby's code.

> For the other functionality  I have these equivalent functions to 
> offer as a workaround (see 
> http://wiki.apache.org/db-derby/LanguageBasedOrdering)
>
> ORDER BY -  Use ORDER BY expression with  LOCALE_ORDER function 
> implemented with Collator.getCollationKey() 
> <, =,  > , BETWEEN  -  Use LOCALE_COMPARE function implemented with 
> Collator.compare()
> IN - Since this is an exact match, would the non-locale specific 
> matching work ok here?

I'm not sure I understand the question. I don't think you can get around 
wrapping local_order around the left and right expressions:

select * from bar where locale_order( 'pl', 'PL', name ) in ( select 
locale_order( 'pl', 'PL', name ) from wibble );

> GROUP-BY - No solution yet but GROUP BY expression in progress will 
> allow LOCALE_ORDER to be used.
> *LIKE - ????????  * Is there some easy Java regular expression 
> matching function  like String.matches(Collator collator, String 
> pattern, String value)?     I can't find it.    The code in 
> org.apache.derby.iapi.types.Like looks pretty involved, but perhaps 
> that is what is needed.   I  just want to confirm before I go down 
> that path and try to figure it out.f

I agree that this looks pretty involved.

>
> Thanks
>
> Kathey
>
> P.S.  I once came very close to getting a cash register meant to 
> interface to a gas pump working in a Deli  with a scale  until Mother 
> Nature stepped in and raised the Russian River to the point that it 
> swallowed the whole thing up, so I have been known to try too hard for 
> a workaround.  If  trying to workaround Locale specific processing 
> with Derby with FUNCTIONS is   a doomed enterprise,  I welcome that 
> perspective as historically I sometimes don't know when to give up.
>


Mime
View raw message