lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Rowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter
Date Tue, 15 May 2018 01:50:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16475166#comment-16475166
] 

Steve Rowe commented on LUCENE-8273:
------------------------------------

I stumbled on what looks like a {{ProtectedTermFilter}} bug when a wrapped filter is a filtering
token filter, and the content to be analyzed contains at least one non-protected term prior
to a protected term; in this case protection fails:

{code:java|title=TestProtectedTerm.java}
  public void testWrappedFilteringTokenFilter() throws IOException {
    CharArraySet protectedTerms = new CharArraySet(5, true);
    protectedTerms.add("foobar");
    TokenStream stream = whitespaceMockTokenizer("foobar abc");
    stream = new ProtectedTermFilter(protectedTerms, stream, in -> new LengthFilter(in,
1, 4));
    assertTokenStreamContents(stream, new String[]{ "foobar", "abc" }); // succeeds

    stream = whitespaceMockTokenizer("wuthering foobar abc");
    stream = new ProtectedTermFilter(protectedTerms, stream, in -> new LengthFilter(in,
1, 4));
    assertTokenStreamContents(stream, new String[]{ "foobar", "abc" }); // fails @ term 0:
Actual: abc
  }
{code}

I haven't yet figured out what the problem is.  Alan, do you understand what's happening here?

> Add a ConditionalTokenFilter
> ----------------------------
>
>                 Key: LUCENE-8273
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8273
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>             Fix For: 7.4
>
>         Attachments: LUCENE-8273-part2.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch,
LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter in such
a way that it could optionally be bypassed based on the current state of the TokenStream.
 This could be used to, for example, only apply WordDelimiterFilter to terms that contain
hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message