lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim Hargrave" <>
Subject Re: Geting exact term positions for each document inside acollect method...
Date Wed, 02 Jul 2003 05:22:07 GMT
Our application indexes and retreieves sentences from a large database. Our terms are overlapping
characters (n-grams). In order to calculate our custom score we need to know the (relative)
position of each n-gram in the matched sentences. I'm currently using a boolen query (each
n-ngram in a big 'OR' statement). I will investigate customizing the query as you suggest.

Basically we are using Lucene as a Translation Memeory tool! Pretty cool. Lucene is wonderful
and I think we can use it in many of our linguistic projects (Terminlogy, concordance, TM


>>> 06/30/03 10:56 AM >>>
Jim Hargrave wrote:
> I've defined my own collector (I want the raw score before it is normalized between 1.0
and 0.0). For each document I need to know the the matching term positions in the document.
 I've seen the methods in IndexReader, but how can I access them inside my collect method?
Are there other methods I am missing? 

No, this information is not available to the hit collector.

Why do you need this?  If it is only for summaries, then you're probably 
better off re-tokenizing those few documents that you wish to summarize. 
  If it is for query evaluation, then you're probably better off writing 
a new class of query (which is non-trivial).


To unsubscribe, e-mail:
For additional commands, e-mail:

This message may contain confidential information, and is intended only for the use of the
individual(s) to whom it is addressed.


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message