lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benson Margulies (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-5202) LookaheadTokenFilter consumes an extra token in nextToken
Date Sun, 08 Sep 2013 23:04:51 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-5202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13761545#comment-13761545
] 

Benson Margulies commented on LUCENE-5202:
------------------------------------------

Well, it only took me about 10 minutes to code a class that did what I needed once you goosed
me into coding it. I suspect that there's something that LTF does that I _don't_ need that
explains why it is so complex. The rolling buffer suggests to me that it's supporting some
much more flexible idea about lookahead than just 'grab a batch, process them, regurgitate
the results (including extra tokens), grab the next batch.'

Or in other words, since there are analyzers in Lucene that are still using pre-AttributeSource
methods to handle creating additional tokens, one would think that there would be a use for
a base class that could support them easily.

in any case, you're welcome.
                
> LookaheadTokenFilter consumes an extra token in nextToken
> ---------------------------------------------------------
>
>                 Key: LUCENE-5202
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5202
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 4.3.1
>            Reporter: Benson Margulies
>         Attachments: LUCENE-5202.patch, LUCENE-5202.patch
>
>
> This is a bit hard to explain except by looking at the test case. I've coded a filter
that uses LookaheadTokenFilter. The incrementToken method peeks some tokens. Then, it seems,
nextToken in the Lookahead class calls peekToken itself, which seems to me to consume a token
so that it's not seen when the derived class sets out to process the next set of tokens.
> In passing, this test case can be used to demonstrate that it does not work to try to
use the afterPosition method to set up attributes of the token that we're 'after'. Probably
that was never intended. However, I'm hoping for some feedback as to whether the rest of the
structure here is as intended for subclasses of LookaheadTokenFilter.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message