lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Russell M. Allen" <>
Subject RE: Scoring a document (count?)
Date Mon, 31 Jul 2006 13:50:45 GMT
Thank you for the reply.

I am certainly open to different ways of organizing / indexing our
documents.  However, the example I provided was simplified for the sake
of the discussion.  In truth, what I was calling a category may be an
arbitrary set of movie ids (determined by a previous query).  This
precludes 'burning in' the associations as independently indexed fields.

I'd like to take a whack at the scorer approach.  I've read through most
of the Lucene web site and have reviewed the source code quite a bit
lately.  However, I admit I am still a little lost in how lucene works
under the covers.  Are there any design documents available to give me a
head start?  Is the Lucene in Action book the only source of
information?  Does it discuss how Lucene works under the covers?


-----Original Message-----
From: Chris Hostetter [] 
Sent: Monday, July 31, 2006 4:02 AM
To: Lucene Users
Subject: Re: Scoring a document (count?)

it would certainly be possible to get a score that was a simple count of
the number of matching clauses of a boolean query -- probably just with
a modified Similarity (no coord, 1/0 tf, no idf, no norms) but you
*might* need a slightly modified TermScorer to do that.

In general though, i think you are solving your problem the wrong way
don't just put the movie Ids in the movie-star docs ... also have one
indexed/stored field per category of movie (ie: "horror" would be an
field) that would only be set on actors which have appeared in a movie
of that type -- the value of the field would be the number of movies
they have appeared in of that type.

now you do your main query, with a simple filter on the "horror" field
to ensure it has a value and you've got the stored value of the "horror"
field to tell you how many movies they've been in.

: Date: Thu, 27 Jul 2006 12:02:46 -0400
: From: Russell M. Allen <>
: Reply-To:
: To:
: Subject: Scoring a document (count?)
: I am curious about the potential use of document scoring as a means to
: extract additional data from an index.  Specifically, I would like the
: score to be a count of how many times a particular field matched a set
: of terms.
: For example, I am indexing movie-stars (Each document is a
: A movie-star has a number of fields, such as name, movies they have
: in, etc.  I want to produce an 'index' of stars by name and show how
: many movies, which match a filter, that they have appeared in.
: In natural language my query might be:
: 	"List all stars who have appeared in a 'horror' movie, where
: last name starts with A, and tell me how many horror movies they were
: in."
: My search will look something like this:
: 	"+lastName:A* +movie:(1 7 21 58 92)"	//where movie is a
: previously computed list of 'horror' movie ids
: If my index contained the following documents:
:     doc1 = lastName:Anna   movie:{3 10}
:     doc2 = lastName:Aba    movie:{1 10 12}
:     doc3 = lastName:Addd   movie:{3 21 55 92}
:     doc4 = lastName:Baaa   movie:{7 56}
: I would like to get back:
:     doc2, score of 1	//score of 1 because only movie 1 matched
:     doc3, score of 2	//score of 2 because movies 21 and 92 matched
: Currently, we perform an initial query against our Star index to
: retrieve a list of stars.  Then we perform N queries against a
: movie index to count the number of movies that match our sub filter
: 'horror'.  This is obviously very inefficient, and as I've shown
: the information (count) is available during the primary query.
: Thoughts?
: ---------------------------------------------------------------------
: To unsubscribe, e-mail:
: For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message