lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: simple (?) question about scoring
Date Fri, 03 Nov 2006 08:07:01 GMT
Michele,

On Friday 03 November 2006 07:07, Michele Amoretti wrote:
> I have a question: is the score for a document different if I have
> only that document in my index, or if I have N documents?
> If the answer is yes, I will put all N documents together, otherwise I
> will evaluate them one by one.
> 
> Btw, I will ask the ws develepoer about how queries are interpreted by
> the search engine.

To compute the score for only a subset of the lucene documents
one normally uses a Filter. Assuming you get the primary keys of
the docs to be scored, you can look them up in the lucene index
and use their internal lucene document numbers to create the Filter.
Then search your query with this filter.
Have a look at the source code of RangeFilter.bits() to see how to
get to the internal document numbers from a set of terms.

Btw. when your database uses the same query to obtain this set of
documents, you might consider to move this function into Lucene
completely, because this will allow you to avoid using a filter alltogether.

Regards,
Paul Elschot


> 
> Thanks
> 
> On 11/3/06, Chris Hostetter <hossman_lucene@fucit.org> wrote:
> >
> > : le list is not ordered (I do not know the details of the search
> > : angine, I only have its result for a query)
> > :
> > : then I have this list of documents, which represents a subset of the 
corpus
> > :
> > : I have to rank the documents of the list, using your scoring algorithm
> >
> > In other words, out of a large copus C, this webservice hase
> > told you that the documents comprising subset S is the top N matching
> > documents for your query Q (where N << sizeof(C))
> >
> > your goal is to sort S as best as possible.
> >
> > You could try indexing all the docs in S in a Lucene RAMDirectory and then
> > search on them, but my orriginal point about the score being
> > fairly meaningless in an index of only 1 document still applies somewhat
> > ... if all of the documents you get back allready have a lot in common
> > 9they must have something in common or hte webservice wouldn't havre
> > returned them in response to your query) it may be hard to get a
> > meaningful document frequency of the words in your query.
> >
> > you also may run into confusion about what exactly your query "is" and
> > wether or not your interpretation matches that of the webservice ... at a
> > very simplisitc level, if the query is "Java Lucene" and your webservice
> > interprets that as an "OR" query and you interpret that as an "AND" query,
> > you might find that the scores you compute for all the docs are 0.
> >
> > If i were in your shoes, i'd try to work with whoever runs this webservice
> > to make it return more usefull informatoin -- at the very least to return
> > results in sorted order.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message