lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Greg Langmead <gr...@dessci.com>
Subject RE: Document contents split among different Fields
Date Thu, 23 Sep 2004 22:00:07 GMT
Doug Cutting wrote:
> Do you need highlights from all fields?  If so, then you can use:
> 
>    TextFragment[] getBestTextFragments(TokenStream, ...);
> 
> with a TokenStream for each field, then select the highest scoring 
> fragments across all fields.  Would that work for you?

Thanks for the reply.  I can't find code like this in the lucene or
lucene-demo packages -- is this something implemented, or did you mean it as
an example?

Once I get a text fragment, are you proposing using it to do a secondary
search within the source document, to match the fragment?

I would like to do highlighting on content from either of my Fields, but I
think that even if I didn't I'd have the same problem, because I'll have
punched holes in the text Field and the positional data within the Field no
longer reflects the position in the source.

I think that if I want to pick the document apart into pieces like this,
then I need to do some work to restore global positional data, by
squirreling away the size of the holes I punch (the size of the XML islands,
from the text Field's point of view, and the size of the text runs, from the
island Field's point of view).  If I store a special textual escape within
the Field data that records the length of each gap, then I can read those
escapes when Tokenizing the Field and add the number stored therein to the
Token offset, restoring the global positional data.  Does that make sense?
I'm concerned this does violence to Lucene's model, which I've only been
studying for a couple of weeks now.

Greg

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message