lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token
Date Fri, 08 Aug 2008 16:51:44 GMT


Michael McCandless commented on LUCENE-1350:

It seems like there are three different things, here:

  # Many filters (eg SnowballFilter) incorrectly erase the Payload,
    token Type and token flags, because they are basically doing
    their own Token cloning.  This is pre-existing (before re-use API
    was created).
  # Separately, these filters do not use the re-use API, which we are
    wanting to migrate to anyway.
  # Adding new "reuse" methods on Token which are like clear() except
    they also take args to replace the termBuffer, start/end offset,
    etc, and they do not clear the payload/flags to their defaults.

Since in LUCENE-1333 we are aggressively moving all Lucene core &
contrib TokenStream & TokenFilters to use the re-use API (formally
deprecating the original non-reuse API), we may as well fix 1 & 2 at

I think the reuse API proposal is reasonable: it mirrors the current
constructors on Token.  But, since we are migrating to reuse api, you
need the analog (of all these constructors) without making a new

But maybe change the name from "reuse" to maybe "update", "set",
"reset", "reinit", or "change"?  But: I think this method should still
reset payload, position incr, etc, to defaults?  Ie calling this
method should get you the same result as creating a new Token(...)
passing in the termBuffer, start/end offset, etc, I think?

Should we just absorb this issue into LUCENE-1333?  DM, of your list
above (of filters that lose payload), are there any that are not fixed
in LUCENE-1333?  I'm confused on the overlap and it's hard to work
with all the patches.  Actually if in LUCENE-1333 you could
consolidate down to a single patch (big toplevel "svn diff"), that'd
be great :)

> Filters which are "consumers" should not reset the payload or flags and should better
reuse the token
> -----------------------------------------------------------------------------------------------------
>                 Key: LUCENE-1350
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>             Fix For: 2.3.3
>         Attachments: LUCENE-1350.patch
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates
the payload, but this is not always convenient.
> Other "consumer" filters have similar problem.
> These filters can - and should - reuse the token, by implementing next(Token), effectively
also fixing the unwanted resetting.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message