lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Tignor <>
Subject token positions
Date Tue, 17 Nov 2009 18:45:26 GMT

Hoping someone might clear up a question for me:

When Tokenizing we provide the start and end character offsets for each
token locating it within the source text.

If I tokenize the text "word" and then search for the term "word" in the
same field, how can I recover this character offset information in the
matching documents to precisely locate the word?  I have been storing this
character info myself using payload data but if lucene stores it, then I am
doing so needlessly.  If recovering this character offset info isn't
possible, what is this character offset info used for?

thanks so much,



Christopher Tignor | Senior Software Architect
155 Spring Street NY, NY 10012
p.212-285-8600 x385 f.212-285-8999

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message