lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michele Amoretti" <>
Subject Re: simple (?) question about scoring
Date Thu, 02 Nov 2006 23:35:33 GMT
The whole problem I have to face is the following:

I have a web service which searches a corpus of documents and returns
a list of documents which match the query

le list is not ordered (I do not know the details of the search
angine, I only have its result for a query)

then I have this list of documents, which represents a subset of the corpus

I have to rank the documents of the list, using your scoring algorithm

now: I do not know if I have to import all the documents in a sort of
Index and apply Lucene's ranking algorithm (if there is one), or take
each document and compute the score of the document vs the query, and
then sort the list based on the scores

currently I am following the second approach, thus I need to compute
the score of each document

I think the MemoryIndex is good for this, I am trying to compile the
example provided in the javadoc, but there is some package lacking...


On 11/2/06, Chris Hostetter <> wrote:
> : > .. Btw, I do not have an index, I have 1 Document, and 1 Query.
> : Lucene scoring - - uses
> : pre-computed statistics, location info, and the number of documents in the
> : index (1 in your case). So some preparation is required before a
> : (stand-alone) document can be scored against a query.
> Doron's comments really just scratch the surface of a larger issue with
> your question: Lucene is not an API for evaluating how similar a
> "Document" is to a "Query", it's for finding Documents in a Corpus which
> match a Query, and (optionally) using the "Score" to know which Documnts
> match better then other docuemnts.
> For most of the various types of Queries that exist in Lucene, the score
> is very dependent on how common the Terms involved are in the Corpus as a
> whole -- if your Corpus consists of only 1 Document, then your scores are
> going to be relatively meaningless.
> Perhaps what you are interested in is more of an substring matching count?
> or an Edit Distance type calculation? ... can you give us a concrete
> example of what type of "score" you are looking for and what you mean when
> you say "Query" ?
> -Hoss
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

Michele Amoretti, Ph.D.
Distributed Systems Group
Dipartimento di Ingegneria dell'Informazione
Università degli Studi di Parma

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message