lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage
Date Thu, 02 Jun 2011 10:07:47 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13042675#comment-13042675
] 

Robert Muir commented on SOLR-2462:
-----------------------------------

Hi James, this sounds like an important issue to fix!

I'm sorry you have all these open improvements to the spellchecker, lets see if we can try
to get some of them resolved.

{quote}
This sets the maximum limit to 1000 possibilities. When this limit is reached, the list is
sorted by rank then reduced to the top 100. From then on, only collations with a rank equal
or better than the 100th are added. This process repeats until finished or until it has taken
50ms, at which time it quits.
{quote}

Maybe we want to use a priority queue instead? It sorta seems like this is what you are doing.


> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list
of *every* possible correction combination.  But if returning several corrections per term,
and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate"
is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers
down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user
will accidently paste the URL into the Search box on our app.  This URL results in a search
with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message