lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: Redundant fields Token class?
Date Fri, 13 Nov 2009 23:01:11 GMT
This is not coupled because:

termLength() is the number of chars in the term buffer, where the offsets
give the offsets in the orginal char stream. If you use a CharFilter to e.g.
remove chars, the termLength will get shorter, but the offset are still the
original ones. Also both things are indexed in different ways, the
termLength and offsets have no relation and must (as said before) not even
follow a contract like end-start=length.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Babak Farhang [mailto:farhang@gmail.com]
> Sent: Friday, November 13, 2009 11:50 PM
> To: java-user@lucene.apache.org
> Subject: Redundant fields Token class?
> 
> I'm writing a TokenFilter and am confused about why class Token has
> both an *endOffset* and a *termLength* field.  It would appear that
> the following invariant should always hold for a Token instance:
> 
>     termLength() == endOffset() - startOffset()
> 
> If so, then
> 
> 1) Why 2 fields, instead of 1?
> 2) Why isn't the invariant enforced in the class?
> 
> -Babak
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message