lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Dyer (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage
Date Thu, 02 Jun 2011 16:54:47 GMT

     [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

James Dyer updated SOLR-2462:
-----------------------------

    Attachment: SOLR-2462.patch

This patch uses a PriorityQueue instead of a sorted List to store the RankedSpellPossibility
objects.  I also went with far simpler logic in safeguarding the performance:  this version
simply quits at 10,000 elements.  I did this because:

1. With a PriorityQueue, there is no simple way to get the 100th element and find its rank
to determine whether or not to add subsequent elements.
2. With the simpler logic, there is no need to keep calling "currentTimeMillis()" as a final
fallback (in itself a performance hog).
3. It is highly unlikely a competitive spellcheck collation will ever be found past the 10,000
combination.

In all, this is a more elegant solution than the prior one.

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list
of *every* possible correction combination.  But if returning several corrections per term,
and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate"
is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers
down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user
will accidently paste the URL into the Search box on our app.  This URL results in a search
with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message