Return-Path: Delivered-To: apmail-lucene-solr-dev-archive@minotaur.apache.org Received: (qmail 8643 invoked from network); 10 May 2009 14:15:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 10 May 2009 14:15:10 -0000 Received: (qmail 13708 invoked by uid 500); 10 May 2009 14:15:10 -0000 Delivered-To: apmail-lucene-solr-dev-archive@lucene.apache.org Received: (qmail 13626 invoked by uid 500); 10 May 2009 14:15:10 -0000 Mailing-List: contact solr-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: solr-dev@lucene.apache.org Delivered-To: mailing list solr-dev@lucene.apache.org Received: (qmail 13616 invoked by uid 99); 10 May 2009 14:15:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 May 2009 14:15:10 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 10 May 2009 14:15:06 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 8E6A4234C041 for ; Sun, 10 May 2009 07:14:45 -0700 (PDT) Message-ID: <579848023.1241964885569.JavaMail.jira@brutus> Date: Sun, 10 May 2009 07:14:45 -0700 (PDT) From: "David Smiley (JIRA)" To: solr-dev@lucene.apache.org Subject: [jira] Commented: (SOLR-1158) Scoring, "numDocs" should be number after applying filters, not entire index In-Reply-To: <756496594.1241962605598.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707793#action_12707793 ] David Smiley commented on SOLR-1158: ------------------------------------ I just realized that not only would numDocs be affected, but so would docFreq. I have a feeling that it may not be possible to enhance Solr to overcome this improvement suggestion because of performance constraints. But I haven't taken a deep look to know this yet. I'm curious what other Lucene/Solr experts think. > Scoring, "numDocs" should be number after applying filters, not entire index > ---------------------------------------------------------------------------- > > Key: SOLR-1158 > URL: https://issues.apache.org/jira/browse/SOLR-1158 > Project: Solr > Issue Type: Improvement > Components: search > Affects Versions: 1.4 > Reporter: David Smiley > Priority: Minor > > I'd like to put different types of things to search for in my Solr index. I use a "type" field to discriminate between these types of things, and my "id" primary key field incorporates the type (ex: "FooType:53") to ensure uniqueness. A problem I see with this approach is that the idf (inverse document frequency) component of the score is based on the entire index and not the type that I'm querying. In particular "numDocs" given to the Similarity.java implementation is the total number of documents in the index. I think it would be more accurate for numDocs to be the filtered number of docs. That is the number of docs after the filter queries are applied. > The only issue I see with this which may or may not be a problem is that the scores (and thus potentially result ordering if sorting by score) would change depending on which filters are applied. That could be counter-intuitive in a faceting UI. Perhaps only a certain filter or filters could be marked as lowering numDocs for scoring. Such a configuration choice strikes me as belonging in the schema. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.