lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Dyer (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-3240) add spellcheck 'approximate collation count' mode
Date Tue, 13 Mar 2012 16:26:41 GMT

    [ https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228492#comment-13228492
] 

James Dyer commented on SOLR-3240:
----------------------------------

collation.hits is just metadata for the user, so I think what you want to do would be entirely
valid.  

The estimates would only be good if the hits are somewhat evenly distributed across the index,
right?  For instance, if you're indexing something by topic and all and then a bunch of new
docs get added on the same topic around the same time, you'd get a cluster of hits in one
place.  

Even so, like you say, many (most) people would rather improve performance than have an accurate
(any) hit count returned.

Beyond this, there are also some dead-simple optimizations we can make by simply removing
any sorting & boosting parameters from the query before testing the collation.
                
> add spellcheck 'approximate collation count' mode
> -------------------------------------------------
>
>                 Key: SOLR-3240
>                 URL: https://issues.apache.org/jira/browse/SOLR-3240
>             Project: Solr
>          Issue Type: Improvement
>          Components: spellchecker
>            Reporter: Robert Muir
>
> SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions
> will actually net results (taking into account context like filtering).
> In order to do this (from my understanding), it generates candidate queries,
> executes them, and saves the total hit count: collation.setHits(hits).
> For a large index it seems this might be doing too much work: in particular
> I'm interested in ensuring this feature can work fast enough/well for autosuggesters.
> So I think we should offer an 'approximate' mode that uses an early-terminating
> Collector, collect()ing only N docs (e.g. n=1), and we approximate this result
> count based on docid space. 
> I'm not sure what needs to happen on the solr side (possibly support for custom collectors?),
> but I think this could help and should possibly be the default.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message