lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Muir <rcm...@gmail.com>
Subject Re: Redundant fields Token class?
Date Fri, 13 Nov 2009 23:20:20 GMT
Another example is if you used a stemmer, it might change the termLength:
(walking -> walk), but the offsets of the original unstemmed word (walking)
stay the same.

On Fri, Nov 13, 2009 at 6:01 PM, Uwe Schindler <uwe@thetaphi.de> wrote:

> This is not coupled because:
>
> termLength() is the number of chars in the term buffer, where the offsets
> give the offsets in the orginal char stream. If you use a CharFilter to
> e.g.
> remove chars, the termLength will get shorter, but the offset are still the
> original ones. Also both things are indexed in different ways, the
> termLength and offsets have no relation and must (as said before) not even
> follow a contract like end-start=length.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: Babak Farhang [mailto:farhang@gmail.com]
> > Sent: Friday, November 13, 2009 11:50 PM
> > To: java-user@lucene.apache.org
> > Subject: Redundant fields Token class?
> >
> > I'm writing a TokenFilter and am confused about why class Token has
> > both an *endOffset* and a *termLength* field.  It would appear that
> > the following invariant should always hold for a Token instance:
> >
> >     termLength() == endOffset() - startOffset()
> >
> > If so, then
> >
> > 1) Why 2 fields, instead of 1?
> > 2) Why isn't the invariant enforced in the class?
> >
> > -Babak
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Robert Muir
rcmuir@gmail.com

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message