lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Scoring a document (count?)
Date Mon, 31 Jul 2006 08:02:29 GMT

it would certainly be possible to get a score that was a simple count of
the number of matching clauses of a boolean query -- probably just with a
modified Similarity (no coord, 1/0 tf, no idf, no norms) but you *might*
need a slightly modified TermScorer to do that.

In general though, i think you are solving your problem the wrong way ...
don't just put the movie Ids in the movie-star docs ... also have one
indexed/stored field per category of movie (ie: "horror" would be an
field) that would only be set on actors which have appeared in a movie of
that type -- the value of the field would be the number of movies they
have appeared in of that type.

now you do your main query, with a simple filter on the "horror" field to
ensure it has a value and you've got the stored value of the "horror"
field to tell you how many movies they've been in.

: Date: Thu, 27 Jul 2006 12:02:46 -0400
: From: Russell M. Allen <>
: Reply-To:
: To:
: Subject: Scoring a document (count?)
: I am curious about the potential use of document scoring as a means to
: extract additional data from an index.  Specifically, I would like the
: score to be a count of how many times a particular field matched a set
: of terms.
: For example, I am indexing movie-stars (Each document is a movie-star).
: A movie-star has a number of fields, such as name, movies they have been
: in, etc.  I want to produce an 'index' of stars by name and show how
: many movies, which match a filter, that they have appeared in.
: In natural language my query might be:
: 	"List all stars who have appeared in a 'horror' movie, where
: last name starts with A, and tell me how many horror movies they were
: in."
: My search will look something like this:
: 	"+lastName:A* +movie:(1 7 21 58 92)"	//where movie is a
: previously computed list of 'horror' movie ids
: If my index contained the following documents:
:     doc1 = lastName:Anna   movie:{3 10}
:     doc2 = lastName:Aba    movie:{1 10 12}
:     doc3 = lastName:Addd   movie:{3 21 55 92}
:     doc4 = lastName:Baaa   movie:{7 56}
: I would like to get back:
:     doc2, score of 1	//score of 1 because only movie 1 matched
:     doc3, score of 2	//score of 2 because movies 21 and 92 matched
: Currently, we perform an initial query against our Star index to
: retrieve a list of stars.  Then we perform N queries against a separate
: movie index to count the number of movies that match our sub filter
: 'horror'.  This is obviously very inefficient, and as I've shown above,
: the information (count) is available during the primary query.
: Thoughts?
: ---------------------------------------------------------------------
: To unsubscribe, e-mail:
: For additional commands, e-mail:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message