Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 48F1E915F for ; Tue, 13 Mar 2012 16:27:06 +0000 (UTC) Received: (qmail 62488 invoked by uid 500); 13 Mar 2012 16:27:02 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 62410 invoked by uid 500); 13 Mar 2012 16:27:02 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 62379 invoked by uid 99); 13 Mar 2012 16:27:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Mar 2012 16:27:02 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Mar 2012 16:27:01 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 27E4C1EE25 for ; Tue, 13 Mar 2012 16:26:41 +0000 (UTC) Date: Tue, 13 Mar 2012 16:26:41 +0000 (UTC) From: "James Dyer (Commented) (JIRA)" To: dev@lucene.apache.org Message-ID: <1850750050.8340.1331656001165.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <2098668892.8123.1331652882202.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (SOLR-3240) add spellcheck 'approximate collation count' mode MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-3240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13228492#comment-13228492 ] James Dyer commented on SOLR-3240: ---------------------------------- collation.hits is just metadata for the user, so I think what you want to do would be entirely valid. The estimates would only be good if the hits are somewhat evenly distributed across the index, right? For instance, if you're indexing something by topic and all and then a bunch of new docs get added on the same topic around the same time, you'd get a cluster of hits in one place. Even so, like you say, many (most) people would rather improve performance than have an accurate (any) hit count returned. Beyond this, there are also some dead-simple optimizations we can make by simply removing any sorting & boosting parameters from the query before testing the collation. > add spellcheck 'approximate collation count' mode > ------------------------------------------------- > > Key: SOLR-3240 > URL: https://issues.apache.org/jira/browse/SOLR-3240 > Project: Solr > Issue Type: Improvement > Components: spellchecker > Reporter: Robert Muir > > SpellCheck's Collation in Solr is a way to ensure spellcheck/suggestions > will actually net results (taking into account context like filtering). > In order to do this (from my understanding), it generates candidate queries, > executes them, and saves the total hit count: collation.setHits(hits). > For a large index it seems this might be doing too much work: in particular > I'm interested in ensuring this feature can work fast enough/well for autosuggesters. > So I think we should offer an 'approximate' mode that uses an early-terminating > Collector, collect()ing only N docs (e.g. n=1), and we approximate this result > count based on docid space. > I'm not sure what needs to happen on the solr side (possibly support for custom collectors?), > but I think this could help and should possibly be the default. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org