lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dawid Weiss (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-871) ISOLatin1AccentFilter a bit slow
Date Tue, 21 Aug 2007 08:51:31 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12521361
] 

Dawid Weiss commented on LUCENE-871:
------------------------------------

I was a bit curious about it, so I decided to write a table-lookup version. It does come out
faster, but only by a small margin (especially on "server", hotspot JVMs). 

Timings are in milliseconds, the round consisted of 100000 repetitions of parsing the test
string "Des mot clés À LA CHAÎNE À Á Â Ã Ä Å Æ Ç È É Ê Ë Ì Í Î Ï Ð Ñ
Ò Ó Ô Õ Ö Ø Œ Þ Ù Ú Û Ü Ý Ÿ à á â ã ä å æ ç è é ê ë ì í î ï
ð ñ ò ó ô õ ö ø œ ß þ ù ú û ü ý ÿ". Note it is biased since most characters
do have accents, which will not be the case in real life I gues... but still:

// SUN JVM build 1.6.0-b105, -server mode
Round (old): 1922
Round (old): 1688
Round (old): 1656
Round (old): 1687
Round (old): 1641
Round (old): 1703
Round (old): 1672
Round (old): 1672
Round (old): 1687
Round (old): 1719
Round (new): 1719
Round (new): 1609
Round (new): 1609
Round (new): 1594
Round (new): 1625
Round (new): 1578
Round (new): 1625
Round (new): 1594
Round (new): 1625
Round (new): 1656

// SUN JVM, 1.6.0, interpreted (-client)

Round (old): 2391
Round (old): 2453
Round (old): 2359
Round (old): 2375
Round (old): 2391
Round (old): 2359
Round (old): 2156
Round (old): 2532
Round (old): 2422
Round (old): 2359
Round (new): 1969
Round (new): 1906
Round (new): 1922
Round (new): 1937
Round (new): 1985
Round (new): 1922
Round (new): 1906
Round (new): 1937
Round (new): 1985
Round (new): 1922

// IBM JVM 1.5.0 (don't know why it's so sluggish, really).

Round (old): 7906
Round (old): 7188
Round (old): 7625
Round (old): 7312
Round (old): 7266
Round (old): 7141
Round (old): 7015
Round (old): 5641
Round (old): 5578
Round (old): 5672
Round (new): 4656
Round (new): 4406
Round (new): 4516
Round (new): 4516
Round (new): 4375
Round (new): 4375
Round (new): 4343
Round (new): 4297
Round (new): 4344
Round (new): 4266

// IBM 1.5.0, -server (note the speed improvement when the old version is hotspot-optimized).

Round (old): 5922
Round (old): 5078
Round (old): 5078
Round (old): 5062
Round (old): 4985
Round (old): 4875
Round (old): 4953
Round (old): 4641
Round (old): 3640
Round (old): 3735
Round (new): 3750
Round (new): 3781
Round (new): 3656
Round (new): 3516
Round (new): 3515
Round (new): 3594
Round (new): 3547
Round (new): 3562
Round (new): 3532
Round (new): 3531

So... it does come out a bit faster. Whether it makes sense to waste 130 kb of memory for
this improvement.... don't know, really. I'll upload the table-lookup version for your reference.

> ISOLatin1AccentFilter a bit slow
> --------------------------------
>
>                 Key: LUCENE-871
>                 URL: https://issues.apache.org/jira/browse/LUCENE-871
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis
>    Affects Versions: 1.9, 2.0.0, 2.0.1, 2.1, 2.2
>            Reporter: Ian Boston
>            Assignee: Michael McCandless
>             Fix For: 2.3
>
>         Attachments: fasterisoremove1.patch, fasterisoremove2.patch, ISOLatin1AccentFilter.java.patch,
LUCENE-871.take4.patch
>
>
> The ISOLatin1AccentFilter is a bit slow giving 300+ ms responses when used in a highligher
for output responses.
> Patch to follow

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message