lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Toph <christopher.m.coll...@gmail.com>
Subject Re: Incorrect Token Offset when using multiple fieldable instance
Date Wed, 02 Jul 2008 14:19:54 GMT


Michael McCandless-2 wrote:
> 
> 
> This would actually be a fairly large change: it's a change to the  
> index format and all APIs that handle offsets during indexing &  
> searching/retrieving.
> 
> 

For now I just changed the offset calculation in DocumentWriter as specified
here by the OP:



> replace DocumentWriter$FieldData#invertField offset = offsetEnd+1; by
> offset = stringValue.length(); 
> 

It has side effects as previously mentioned on this list, e.g. if the
tokenstream is not backed by a stringValue or the Analyzer does not
calculate offsets in the normal way.  But for my purposes it works.

This issue was also discussed previously 
http://lucene.markmail.org/search/?q=offset%20documentwriter#query:offset%20documentwriter+page:1+mid:l6jbfmfisyg5zyre+state:results
here .


Michael McCandless-2 wrote:
> 
> 
> We could alternatively extend TokenStream so you could query it for  
> the final offset, then fix indexing to use that value instead of the  
> endOffset of the last token that it saw.
> 
> 

Querying the tokenstream for the final offset would good, but then would the
change be put into the DocumentWriter directly or available as an option?

Chris
-- 
View this message in context: http://www.nabble.com/Incorrect-Token-Offset-when-using-multiple-fieldable-instance-tp15833468p18238566.html
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message