lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <>
Subject Re: token positions
Date Tue, 17 Nov 2009 19:18:16 GMT
The character offset info is only stored if you enable
Field.TermVector.WITH_OFFSETS or WITH_POSITIONS_OFFSETS on the field.

Then, it can only be retrieved if you get the term vectors for that
document, and locate the term & specific occurrence that you're
interested in.

This is likely quite a bit more costly than what you're doing now
(using payloads) so if you retrieve this eg for every hit it's
probably best to stick with payloads.


On Tue, Nov 17, 2009 at 1:45 PM, Christopher Tignor
<> wrote:
> Hello,
> Hoping someone might clear up a question for me:
> When Tokenizing we provide the start and end character offsets for each
> token locating it within the source text.
> If I tokenize the text "word" and then search for the term "word" in the
> same field, how can I recover this character offset information in the
> matching documents to precisely locate the word?  I have been storing this
> character info myself using payload data but if lucene stores it, then I am
> doing so needlessly.  If recovering this character offset info isn't
> possible, what is this character offset info used for?
> thanks so much,
> C>T>
> --
> Christopher Tignor | Senior Software Architect
> 155 Spring Street NY, NY 10012
> p.212-285-8600 x385 f.212-285-8999

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message