lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Woodward (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (LUCENE-8273) Add a ConditionalTokenFilter
Date Tue, 01 May 2018 09:46:00 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-8273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459564#comment-16459564
] 

Alan Woodward commented on LUCENE-8273:
---------------------------------------

Updated patch.  {{ConditionalTokenFilterFactory}} is now a top-level class, distinct from
{{ConditionBuilder}}.  I've added a TermExclusionFilter that accepts a list of terms and only
runs its child filters if the current token is not in its list, and demonstrated how to use
it in TestCustomAnalyzer.  At the moment it just reads a word file, but we can expand it to
accept patterns or a directly passed in list of terms in follow ups.  I've also changed the
CustomAnalyzerBuilder to use {{when}} rather than {{ifXXX}} - thanks for the suggestion Steve!

> Add a ConditionalTokenFilter
> ----------------------------
>
>                 Key: LUCENE-8273
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8273
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch, LUCENE-8273.patch,
LUCENE-8273.patch
>
>
> Spinoff of LUCENE-8265.  It would be useful to be able to wrap a TokenFilter in such
a way that it could optionally be bypassed based on the current state of the TokenStream.
 This could be used to, for example, only apply WordDelimiterFilter to terms that contain
hyphens.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message