db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel John Debrunner (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-2699) performance of like in territory based collation databases may be improved by changing way collation elements are calculated.
Date Thu, 06 Sep 2007 18:40:32 GMT

    [ https://issues.apache.org/jira/browse/DERBY-2699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12525494
] 

Daniel John Debrunner commented on DERBY-2699:
----------------------------------------------

I think the approach of getting collation elements as needed would have a large affect on
all string comparisons.

I created a scale 4 order entry database with and without a collated database. Just looking
at the load collation will only affect 'index.sql' which creates an index including the customer's
last name. With UCS_BASIC collation the index created in about 2.5 seconds, with TERRITORY_BASED
collation the time was over 11 seconds.

I don't think that the collation overhead should be that high, I would expect maybe a 10-20%
overhead, not around 450%

> performance of like in territory based collation databases may be improved by changing
way collation elements are calculated.
> -----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-2699
>                 URL: https://issues.apache.org/jira/browse/DERBY-2699
>             Project: Derby
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 10.3.1.4
>            Reporter: Mike Matrigali
>
> WorkHorseForCollatorDatatypes.java has a method getCollationElementsForString() which
currently gets
> called when processing like clauses in databases that have been created with territory
based collation, this is
> not an issue in pre-10.3 databases or post 10.3 default databases.
> getCollationElementsForString gets the collation elements for the entire  value of the
String held by
> the datatype using the class.
> If you take the case of pattern 'A%' and the value of datatype is 'BXXXXXXXXXXXXXXXXXXXXXXX',

> then it would have been better to  better to get collation elements one character of
the String value at a time
> to avoid the  process of getting collation elements for the entire string when we don't
really need it 
> One could imagine this might have a huge performance impact on running like against a
long clob where
> the like pattern has leading fixed-length pattern to match.
> Comments on this from Dan and Dag can be found in DERBY-2416.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message