lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kaleem Ahmed (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (SOLR-2953) Introducing hit Count as an alternative to score
Date Wed, 07 Dec 2011 09:54:40 GMT
Introducing hit Count as an alternative to score 
-------------------------------------------------

                 Key: SOLR-2953
                 URL: https://issues.apache.org/jira/browse/SOLR-2953
             Project: Solr
          Issue Type: New Feature
          Components: search
    Affects Versions: 4.0
            Reporter: Kaleem Ahmed
             Fix For: 4.0


As of now we have score as relevancy factor for a query against a document, and this score
is relative to the number of documents in the index. In the same way why not have some other
relevancy feature say "hitCounts" which is absolute for a given doc and a given query, It
shouldn't depend on the number of documents in the index. This will help a lot for the frequently
changing indexes , where the search rules are predefined along the relevancy factor for a
document to be qualified for that query(search rule). 

Ex: consider a use case where a list of queries are formed with a threshold number for each
query and these are searched on a frequently updated index to get the documents that score
above the threshold i.e. when a document's relevancy factor crosses the threshold for a query
the document is said to be qualified for that query. 
For the above use case to satisfy the score shouldn't change every time the index gets updated
with new documents. So we introduce new feature called "hitCount"  which represents the relevancy
of a document against a query and it is absolute(won't change with index size). 

This hitCount is a positive integer and is calculated as follows 
Ex: Document with text "the quick fox jumped over the lazy dog, while the lazy dog was too
lazy to care" 
1. for the query "lazy AND dog" the hitCount will be == (no of occurrences of "lazy" in the
document) +  (no of occurrences of "dog" in the document)  =>  3+2 => 5  


2. for the phrase query  \"lazy dog\"  the hitCount will be == (no of occurrences of exact
phrase "lazy dog" in the document) => 2

This will be very useful  as an alternative scoring mechanism.

I already implemented this whole thing in the Solr source code(that I downloaded) and we are
using it. So far it's going good. 
It would be really great if this feature is added to trunk (original  Solr) so that we don't
have to implement the changes every time  a new version is released and also others could
be benefited with this.     







--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message