lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1719) Add javadoc notes about ICUCollationKeyFilter's advantages over CollationKeyFilter
Date Tue, 30 Jun 2009 14:39:47 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12725648#action_12725648
] 

Simon Willnauer commented on LUCENE-1719:
-----------------------------------------

Steven, patch looks good to me. I will look at it again in a day or two.

simon

> Add javadoc notes about ICUCollationKeyFilter's advantages over CollationKeyFilter
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-1719
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1719
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 2.4.1
>            Reporter: Steven Rowe
>            Assignee: Simon Willnauer
>            Priority: Trivial
>             Fix For: 2.9
>
>         Attachments: LUCENE-1719.patch, LUCENE-1719.patch
>
>
> contrib/collation's ICUCollationKeyFilter, which uses ICU4J collation, is faster than
CollationKeyFilter, the JVM-provided java.text.Collator implementation in the same package.
 The javadocs of these classes should be modified to add a note to this effect.
> My curiosity was piqued by [Robert Muir's comment|https://issues.apache.org/jira/browse/LUCENE-1581?focusedCommentId=12720300#action_12720300]
on LUCENE-1581, in which he states that ICUCollationKeyFilter is up to 30x faster than CollationKeyFilter.
> I timed the operation of these two classes, with Sun JVM versions 1.4.2/32-bit, 1.5.0/32-
and 64-bit, and 1.6.0/64-bit, using 90k word lists of 4 languages (taken from the corresponding
Debian wordlist packages and truncated to the first 90k words after a fixed random shuffling),
using Collators at the default strength, on a Windows Vista 64-bit machine.  I used an analysis
pipeline consisting of WhitespaceTokenizer chained to the collation key filter, so to isolate
the time taken by the collation key filters, I also timed WhitespaceTokenizer operating alone
for each combination.  The rightmost column represents the performance advantage of the ICU4J
implemtation (ICU) over the java.text.Collator implementation (JVM), after discounting the
WhitespaceTokenizer time (WST): (JVM-ICU) / (ICU-WST). The best times out of 5 runs for each
combination, in milliseconds, are as follows:
> ||Sun JVM||Language||java.text||ICU4J||WhitespaceTokenizer||ICU4J Improvement||
> |1.4.2_17 (32 bit)|English|522|212|13|156%|
> |1.4.2_17 (32 bit)|French|716|243|14|207%|
> |1.4.2_17 (32 bit)|German|669|264|16|163%|
> |1.4.2_17 (32 bit)|Ukranian|931|474|25|102%|
> |1.5.0_15 (32 bit)|English|604|176|16|268%|
> |1.5.0_15 (32 bit)|French|817|209|17|317%|
> |1.5.0_15 (32 bit)|German|799|225|20|280%|
> |1.5.0_15 (32 bit)|Ukranian|1029|436|26|145%|
> |1.5.0_15 (64 bit)|English|431|89|10|433%|
> |1.5.0_15 (64 bit)|French|562|112|11|446%|
> |1.5.0_15 (64 bit)|German|567|116|13|438%|
> |1.5.0_15 (64 bit)|Ukranian|734|281|21|174%|
> |1.6.0_13 (64 bit)|English|162|81|9|113%|
> |1.6.0_13 (64 bit)|French|192|92|10|122%|
> |1.6.0_13 (64 bit)|German|204|99|14|124%|
> |1.6.0_13 (64 bit)|Ukranian|273|202|21|39%|

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message