lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <hossman_luc...@fucit.org>
Subject Re: Scoring a document (count?)
Date Mon, 31 Jul 2006 08:02:29 GMT

it would certainly be possible to get a score that was a simple count of
the number of matching clauses of a boolean query -- probably just with a
modified Similarity (no coord, 1/0 tf, no idf, no norms) but you *might*
need a slightly modified TermScorer to do that.

In general though, i think you are solving your problem the wrong way ...
don't just put the movie Ids in the movie-star docs ... also have one
indexed/stored field per category of movie (ie: "horror" would be an
indexed
field) that would only be set on actors which have appeared in a movie of
that type -- the value of the field would be the number of movies they
have appeared in of that type.

now you do your main query, with a simple filter on the "horror" field to
ensure it has a value and you've got the stored value of the "horror"
field to tell you how many movies they've been in.




: Date: Thu, 27 Jul 2006 12:02:46 -0400
: From: Russell M. Allen <Russell.Allen@aebn.net>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Scoring a document (count?)
:
: I am curious about the potential use of document scoring as a means to
: extract additional data from an index.  Specifically, I would like the
: score to be a count of how many times a particular field matched a set
: of terms.
:
: For example, I am indexing movie-stars (Each document is a movie-star).
: A movie-star has a number of fields, such as name, movies they have been
: in, etc.  I want to produce an 'index' of stars by name and show how
: many movies, which match a filter, that they have appeared in.
:
: In natural language my query might be:
: 	"List all stars who have appeared in a 'horror' movie, where
: last name starts with A, and tell me how many horror movies they were
: in."
:
: My search will look something like this:
: 	"+lastName:A* +movie:(1 7 21 58 92)"	//where movie is a
: previously computed list of 'horror' movie ids
:
: If my index contained the following documents:
:     doc1 = lastName:Anna   movie:{3 10}
:     doc2 = lastName:Aba    movie:{1 10 12}
:     doc3 = lastName:Addd   movie:{3 21 55 92}
:     doc4 = lastName:Baaa   movie:{7 56}
:
: I would like to get back:
:     doc2, score of 1	//score of 1 because only movie 1 matched
:     doc3, score of 2	//score of 2 because movies 21 and 92 matched
:
:
:
: Currently, we perform an initial query against our Star index to
: retrieve a list of stars.  Then we perform N queries against a separate
: movie index to count the number of movies that match our sub filter
: 'horror'.  This is obviously very inefficient, and as I've shown above,
: the information (count) is available during the primary query.
:
: Thoughts?
:
:
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message