lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ype Kingma <ykin...@xs4all.nl>
Subject Re: Finding out which field caused the hit?
Date Wed, 28 May 2003 03:51:27 GMT
Daniel,

On Tuesday 27 May 2003 11:17, Armbrust, Daniel C. wrote:
> Is there a (better) way that I can use to figure out which field in a
> document caused the document to be returned from a query?  Currently, after
> I do a search across all of my fields and documents, I am researching on
> each document that had a hit, on each field individually, and keeping track
> of the scores..  The highest scoring field is the one that I credit with
> returning the document.
>
> This is fine for a small index, with a small number of fields, but it
> definitely doesn't seem like the correct way to go about getting this
> information.
>
> Any suggestions would be appreciated,

To save the researching on previous hits you have two options.

You could make a separate index for each field, query all indexes
with a MultiSearcher, and suppress repeated documents from
other indexes with lower scores.
You could also do use a single database, but with
a separate query for each subfield.
Either way the results can be collected in a single document
collector object, keeping only the best hits. You'll
have to keep a larger number of best hits first and later drop
the lower scoring ones for the same source document
in case you need to retrieve a stored field from each 
hit to determine the original source document.

I have a very similar situation with some databases
containing titles and abstracts, some abstracts only,
some full text with or without abstract.

The nice thing about the default ranking mechanism
is that it works out pretty well: short titles can
score rather high, then normally abstracts,
followed by full text. This is close to the
separate index for each field I suggested above.

The alternative to repeat the query with different
fields in a single index could actually be bit better
because you need have to retrieve the stored field(s)
for each document only once. However, it's not
as flexible.

Kind regards,
Ype Kingma


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message