lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yonik Seeley" <yo...@apache.org>
Subject Re: Token/Payload API
Date Sat, 12 May 2007 01:03:55 GMT
On 5/11/07, Grant Ingersoll <gsingers@apache.org> wrote:
> On May 11, 2007, at 4:31 PM, Yonik Seeley wrote:
>
> > I hadn't kept up with the payload discussion/patch, and just got
> > around to looking at Token.
> >
> > public class Token implements Cloneable {
> >  String termText;                               // the text of the term
> >  int startOffset;                               // start in source text
> >  int endOffset;                                 // end in source text
> >  String type = "word";                                  // lexical type
> >
> >  Payload payload;
> >
> >
> > It almost feels like we are going down the road of Field, adding more
> > and more to the base class instead of using some other mechanism like
> > inheritance.
>
> So PayloadToken would be more inline with what you are thinking?
> Then there becomes the need to do instanceof to determine when you
> have payloads?

I don't have a good answer for that one... a real inheritance solution
would be invasive to the indexing code and probably not worth it at
this point.  There is also the problem of mixing different (future)
token properties... what you really want are mixins or something.

At this point, just forget I brought it up ;-)

> > A bigger problem, however, is that payloads will be lost by filters
> > that aren't payload aware, and create new Tokens.  We had the same
> > problem with position increments being lost.
> >
> > For this latter problem, I think the answer is to *not* create new
> > tokens, and make all the properties of Token settable.
>
> This seems reasonable.  I never quite understood the need to create
> new tokens.  The other option may be to use a copy constructor, but
> again, that seems wasteful.

We have clone() when new tokens need to be created (that's needed when
filters create more tokens, like synonym injection, etc).  Since Token
could be subclassed, that's probably the right approach.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message