Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1226B9EDB for ; Mon, 20 Feb 2012 12:10:03 +0000 (UTC) Received: (qmail 78824 invoked by uid 500); 20 Feb 2012 12:10:01 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 78766 invoked by uid 500); 20 Feb 2012 12:10:01 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 78759 invoked by uid 99); 20 Feb 2012 12:10:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Feb 2012 12:10:01 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Feb 2012 12:09:55 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id D15611C2BE0 for ; Mon, 20 Feb 2012 12:09:34 +0000 (UTC) Date: Mon, 20 Feb 2012 12:09:34 +0000 (UTC) From: "Kaleem Ahmed (Closed) (JIRA)" To: dev@lucene.apache.org Message-ID: <1090423115.2224.1329739774858.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <610555248.48833.1323251680239.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Closed] (SOLR-2953) Introducing hit Count as an alternative to score MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/SOLR-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kaleem Ahmed closed SOLR-2953. ------------------------------ Resolution: Not A Problem Closing as the 4.0 has this feature already implemented through similarity pacakage classes > Introducing hit Count as an alternative to score > ------------------------------------------------- > > Key: SOLR-2953 > URL: https://issues.apache.org/jira/browse/SOLR-2953 > Project: Solr > Issue Type: New Feature > Components: search > Affects Versions: 4.0 > Reporter: Kaleem Ahmed > Labels: features > Fix For: 4.0 > > Original Estimate: 1,008h > Remaining Estimate: 1,008h > > As of now we have score as relevancy factor for a query against a document, and this score is relative to the number of documents in the index. In the same way why not have some other relevancy feature say "hitCounts" which is absolute for a given doc and a given query, It shouldn't depend on the number of documents in the index. This will help a lot for the frequently changing indexes , where the search rules are predefined along the relevancy factor for a document to be qualified for that query(search rule). > Ex: consider a use case where a list of queries are formed with a threshold number for each query and these are searched on a frequently updated index to get the documents that score above the threshold i.e. when a document's relevancy factor crosses the threshold for a query the document is said to be qualified for that query. > For the above use case to satisfy the score shouldn't change every time the index gets updated with new documents. So we introduce new feature called "hitCount" which represents the relevancy of a document against a query and it is absolute(won't change with index size). > This hitCount is a positive integer and is calculated as follows > Ex: Document with text "the quick fox jumped over the lazy dog, while the lazy dog was too lazy to care" > 1. for the query "lazy AND dog" the hitCount will be == (no of occurrences of "lazy" in the document) + (no of occurrences of "dog" in the document) => 3+2 => 5 > 2. for the phrase query \"lazy dog\" the hitCount will be == (no of occurrences of exact phrase "lazy dog" in the document) => 2 > This will be very useful as an alternative scoring mechanism. > I already implemented this whole thing in the Solr source code(that I downloaded) and we are using it. So far it's going good. > It would be really great if this feature is added to trunk (original Solr) so that we don't have to implement the changes every time a new version is released and also others could be benefited with this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org