lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-2023) Improve performance of SmartChineseAnalyzer
Date Fri, 30 Oct 2009 19:33:59 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-2023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12772054#action_12772054
] 

Robert Muir commented on LUCENE-2023:
-------------------------------------

hi DM,

i think the bounds checks are redundant actually, 
With both situations, the bounds are calculated up front in the constructor.

bq. Is it an invariant that tokenPair.to will always be in bounds?

Yes, in this case.

The reason I did this is for isToExist, etc is because those methods are public... but this
stuff is pkg private anyway so maybe i should delete the bounds checks altogether???


> Improve performance of SmartChineseAnalyzer
> -------------------------------------------
>
>                 Key: LUCENE-2023
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2023
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/analyzers
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>            Priority: Minor
>             Fix For: 3.0
>
>         Attachments: LUCENE-2023.patch
>
>
> I've noticed SmartChineseAnalyzer is a bit slow, compared to say CJKAnalyzer on chinese
text.
> This patch improves the internal hhmm implementation. 
> Time to index my chinese corpus is 75% of the previous time.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message