lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Dyer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage
Date Fri, 03 Jun 2011 19:32:47 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13043991#comment-13043991
] 

James Dyer commented on SOLR-2462:
----------------------------------

Yeah, the I agree the time limit is a bit of a hack.  On the other hand, the list of possibilities
it needs to evaluate can get really long really fast.  If you're returning 15 or 20 suggestions
per word and the user misspells 10 or so words, you get a pretty big list of combinations
(in our case users were pasting the URL in the search box generating a query with 12 "misspelled"
words...)  Then again, this latest version is much faster than what I had put out there originally...

Maybe we can just put a hard limit on the number of possibilities it will evaluate?  It could
be really high like a million or something.  We could make it a configurable parameter, something
like "spellcheck.maxCollationPossibilitiesToEval" , but then again that seems silly.  Who
would really change it if a million was the default ?

At the end of the day, I'd feel better where I am at if Solr had some kind of secondary fallback
here.  One thing that really made me nervous about our previous search engine is it wasn't
terribly hard to send a query over to it that would crash the thing or make it churn a long
time just to return nothing.  So far my experience is that Solr is less prone to this kind
of failure and I'd really like to keep it that way...

> Using spellcheck.collate can result in extremely high memory usage
> ------------------------------------------------------------------
>
>                 Key: SOLR-2462
>                 URL: https://issues.apache.org/jira/browse/SOLR-2462
>             Project: Solr
>          Issue Type: Bug
>          Components: spellchecker
>    Affects Versions: 3.1
>            Reporter: James Dyer
>            Priority: Critical
>             Fix For: 3.1.1, 4.0
>
>         Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch,
SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch
>
>
> When using "spellcheck.collate", class SpellPossibilityIterator creates a ranked list
of *every* possible correction combination.  But if returning several corrections per term,
and if several words are misspelled, the existing algorithm uses a huge amount of memory.
> This bug was introduced with SOLR-2010.  However, it is triggered anytime "spellcheck.collate"
is used.  It is not necessary to use any features that were added with SOLR-2010.
> We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers
down with "infinite" GC loops.  It was pretty easy for this to happen as occasionally a user
will accidently paste the URL into the Search box on our app.  This URL results in a search
with ~12 misspelled words.  We have "spellcheck.count" set to 15. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message