lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Muir (JIRA)" <>
Subject [jira] [Updated] (LUCENE-4065) FilteringTokenFilter should never corrupt the tokenstream graph
Date Thu, 17 May 2012 13:41:07 GMT


Robert Muir updated LUCENE-4065:

    Attachment: LUCENE-4065_test.patch

test case (boiled down from testrandomchains)

A much simpler one could be made.
> FilteringTokenFilter should never corrupt the tokenstream graph
> ---------------------------------------------------------------
>                 Key: LUCENE-4065
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: modules/analysis
>            Reporter: Robert Muir
>         Attachments: LUCENE-4065_test.patch
> Currently removers like stopfilter have an option (true/false) to enable position increments.
> If its true: it both inserts gaps where necessary AND propagates gaps down the stream.
> If its false: it does neither, which can totally mess up the tokenstream graph (e.g.
move synonyms to another word).
> There are totally valid natural usecases for false, where you don't want gaps because
you want phrasequeries to act as if the word was never actually there.
> But 'not inserting gaps' is separate from proper propagation of existing gaps.
> So I think we should provide an option (either fix 'false' or make it an enum), where
you still get a legit tokenstream and dont totally screw it up, but you simply omit gaps.
> See LUCENE-3848 for more information (Where we at least fixed this case to not begin
the tokenstream with posinc=0)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message