lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Miller <markrmil...@gmail.com>
Subject Re: Term Based Meta Data
Date Mon, 11 Aug 2008 15:36:52 GMT
If I were feeling adventurous, and I wanted to help out Mark with 
Lucene-1001, I'd try this:

Get the trunk and apply Lucene-1001.

Index all of your docs with the highlight coords as payloads.

At highlight time, do something like the SpanHighlighter does - I've got 
a class called something like PayloadSpansUtil  to help out with this. 
You want to index the doc to be highlighted into a MemoryIndex, and then 
pass an IndexReader off that to the util class - like the 
SpanHighlighter, it will convert a Query into a SpanQuery approximation, 
but instead of getting positions for matches, it will collect all of the 
payloads for matches.

Now run through the payloads and make a nice yellow/orange translucent 
block over each hit in the original image using the coords from each 
payload.


- Mark


Martin Owens wrote:
>>     Following the history of Payloads from its beginnings 
>> (https://issues.apache.org/jira/browse/LUCENE-755, 
>> https://issues.apache.org/jira/browse/LUCENE-761, 
>> https://issues.apache.org/jira/browse/LUCENE-834, 
>> http://wiki.apache.org/lucene-java/Payload_Planning) it looks like 
>> TermPostionsVector was never considered as part of the Payload 
>> functionality.  I think this is based on the underlying index file 
>> structure???  I don't see any way to get at a Payload other than through 
>> a TermPositions object.  I don't think there is a way to translate code 
>> which uses TermPositions to using TermPositionVector with regards to 
>> payloads  -- but I welcome someone to show me how they could.
>>     
>
> Very interesting, and it fills in a few missing bits.
>
>   
>>     Maybe there is some other work around.  What are you trying to 
>> accomplish "historically" with TermPositionsVectors instead of 
>> TermPositions?
>>     
>
> Historically we've not been able to access the TermPositions object
> because it seemed to require that the original text was stored and not
> just indexed (although I can't see why) Perhaps I am mistaken?
>
> We're not storing the text context because a) there is rather a lot of
> it, b) we have the text files stored on special storage boxes mounted to
> the webservers and they're using directly and c) It didn't seem worth
> it.
>
> Thoughts? So can I use the TermPositions object without the stored text?
>
> Best Regards, Martin Owens
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message