lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <>
Subject Re: Term Based Meta Data
Date Mon, 11 Aug 2008 15:36:52 GMT
If I were feeling adventurous, and I wanted to help out Mark with 
Lucene-1001, I'd try this:

Get the trunk and apply Lucene-1001.

Index all of your docs with the highlight coords as payloads.

At highlight time, do something like the SpanHighlighter does - I've got 
a class called something like PayloadSpansUtil  to help out with this. 
You want to index the doc to be highlighted into a MemoryIndex, and then 
pass an IndexReader off that to the util class - like the 
SpanHighlighter, it will convert a Query into a SpanQuery approximation, 
but instead of getting positions for matches, it will collect all of the 
payloads for matches.

Now run through the payloads and make a nice yellow/orange translucent 
block over each hit in the original image using the coords from each 

- Mark

Martin Owens wrote:
>>     Following the history of Payloads from its beginnings 
>> (, 
>> it looks like 
>> TermPostionsVector was never considered as part of the Payload 
>> functionality.  I think this is based on the underlying index file 
>> structure???  I don't see any way to get at a Payload other than through 
>> a TermPositions object.  I don't think there is a way to translate code 
>> which uses TermPositions to using TermPositionVector with regards to 
>> payloads  -- but I welcome someone to show me how they could.
> Very interesting, and it fills in a few missing bits.
>>     Maybe there is some other work around.  What are you trying to 
>> accomplish "historically" with TermPositionsVectors instead of 
>> TermPositions?
> Historically we've not been able to access the TermPositions object
> because it seemed to require that the original text was stored and not
> just indexed (although I can't see why) Perhaps I am mistaken?
> We're not storing the text context because a) there is rather a lot of
> it, b) we have the text files stored on special storage boxes mounted to
> the webservers and they're using directly and c) It didn't seem worth
> it.
> Thoughts? So can I use the TermPositions object without the stored text?
> Best Regards, Martin Owens
> ---------------------------------------------------------------------
> To unsubscribe, e-mail:
> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message