lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1350) Filters which are "consumers" should not reset the payload or flags and should better reuse the token
Date Fri, 08 Aug 2008 16:51:44 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12620970#action_12620970
] 

Michael McCandless commented on LUCENE-1350:
--------------------------------------------


It seems like there are three different things, here:

  # Many filters (eg SnowballFilter) incorrectly erase the Payload,
    token Type and token flags, because they are basically doing
    their own Token cloning.  This is pre-existing (before re-use API
    was created).
  # Separately, these filters do not use the re-use API, which we are
    wanting to migrate to anyway.
  # Adding new "reuse" methods on Token which are like clear() except
    they also take args to replace the termBuffer, start/end offset,
    etc, and they do not clear the payload/flags to their defaults.

Since in LUCENE-1333 we are aggressively moving all Lucene core &
contrib TokenStream & TokenFilters to use the re-use API (formally
deprecating the original non-reuse API), we may as well fix 1 & 2 at
once.

I think the reuse API proposal is reasonable: it mirrors the current
constructors on Token.  But, since we are migrating to reuse api, you
need the analog (of all these constructors) without making a new
Token.

But maybe change the name from "reuse" to maybe "update", "set",
"reset", "reinit", or "change"?  But: I think this method should still
reset payload, position incr, etc, to defaults?  Ie calling this
method should get you the same result as creating a new Token(...)
passing in the termBuffer, start/end offset, etc, I think?

Should we just absorb this issue into LUCENE-1333?  DM, of your list
above (of filters that lose payload), are there any that are not fixed
in LUCENE-1333?  I'm confused on the overlap and it's hard to work
with all the patches.  Actually if in LUCENE-1333 you could
consolidate down to a single patch (big toplevel "svn diff"), that'd
be great :)


> Filters which are "consumers" should not reset the payload or flags and should better
reuse the token
> -----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-1350
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1350
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Analysis, contrib/*
>            Reporter: Doron Cohen
>            Assignee: Doron Cohen
>             Fix For: 2.3.3
>
>         Attachments: LUCENE-1350.patch
>
>
> Passing tokens with payloads through SnowballFilter results in tokens with no payloads.
> A workaround for this is to apply stemming first and only then run whatever logic creates
the payload, but this is not always convenient.
> Other "consumer" filters have similar problem.
> These filters can - and should - reuse the token, by implementing next(Token), effectively
also fixing the unwanted resetting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message