lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Willnauer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-4556) FuzzyTermsEnum creates tons of objects
Date Tue, 13 Nov 2012 14:54:13 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-4556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Simon Willnauer updated LUCENE-4556:
------------------------------------

    Attachment: LUCENE-4556.patch

here is a patch ...scaryâ„¢
                
> FuzzyTermsEnum creates tons of objects
> --------------------------------------
>
>                 Key: LUCENE-4556
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4556
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/search, modules/spellchecker
>    Affects Versions: 4.0
>            Reporter: Simon Willnauer
>            Assignee: Simon Willnauer
>            Priority: Critical
>             Fix For: 4.1, 5.0
>
>         Attachments: LUCENE-4556.patch
>
>
> I ran into this problem in production using the DirectSpellchecker. The number of objects
created by the spellchecker shoot through the roof very very quickly. We ran about 130 queries
and ended up with > 2M transitions / states. We spend 50% of the time in GC just because
of transitions. Other parts of the system behave just fine here.
> I talked quickly to robert and gave a POC a shot providing a LevenshteinAutomaton#toRunAutomaton(prefix,
n) method to optimize this case and build a array based strucuture converted into UTF-8 directly
instead of going through the object based APIs. This involved quite a bit of changes but they
are all package private at this point. I have a patch that still has a fair set of nocommits
but its shows that its possible and IMO worth the trouble to make this really useable in production.
All tests pass with the patch - its a start....

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message