lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Rowe (JIRA)" <>
Subject [jira] Created: (LUCENE-2470) Add conditional braching/merging to Lucene's analysis pipeline
Date Wed, 19 May 2010 17:26:53 GMT
Add conditional braching/merging to Lucene's analysis pipeline

                 Key: LUCENE-2470
             Project: Lucene - Java
          Issue Type: New Feature
          Components: Analysis
    Affects Versions: 4.0
            Reporter: Steven Rowe
            Priority: Minor

Captured from a #lucene brainstorming session with Robert Muir:

Lucene's analysis pipeline would be more flexible if it were possible to apply filter(s) to
only part of an input stream's tokens, under user-specifiable conditions (e.g. when a given
token attribute has a particular value) in a way that did not place that responsibility on
individual filters.

Two use cases:

# StandardAnalyzer could directly handle ideographic characters in the same way as CJKTokenizer,
which generates bigrams, if it could call ShingleFilter only when the TypeAttribute=<CJK>,
or if Robert's new ScriptAttribute=<Ideographic>.
# Stemming might make sense for some stemmer/domain combinations only when token length exceeds
some threshold.  For example, a user could configure an analyzer to stem only when CharTermAttribute
length is greater than 4 characters.

One potential way to achieve this conditional branching facility is with a new kind of filter
that can be configured with one or more following filters and condition(s) under which the
filter should be engaged.  This could be called BranchingFilter.

I think a MergingFilter, the inverse of BranchingFilter, is necessary in the current pipeline
architecture, to have a single pipeline endpoint.  A MergingFilter might be useful in its
own right, e.g. to collect document data from multiple sources.  Perhaps a conditional merging
facility would be useful as well.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message