lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <>
Subject [jira] Created: (LUCENE-1676) New Token filter for adding payloads "in-stream"
Date Tue, 02 Jun 2009 16:40:07 GMT
New Token filter for adding payloads "in-stream"

                 Key: LUCENE-1676
             Project: Lucene - Java
          Issue Type: New Feature
          Components: contrib/analyzers
            Reporter: Grant Ingersoll
            Assignee: Grant Ingersoll
            Priority: Minor
             Fix For: 2.9

This TokenFilter is able to split a token based on a delimiter and use one part as the token
and the other part as a payload.  This allows someone to include payloads inline with tokens
(presumably setup by a pipeline ahead of time).  An example is apropos.  Given a | delimiter,
we could have a stream that looks like:
{quote}The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ dogs|NN{quote}

In this case, this would produce tokens and payloads (assuming whitespace tokenization):
Token: the
Payload: null

Token: quick
Payload: JJ

Token: red
Pay: JJ.

and so on.

This patch will also support pluggable encoders for the payloads, so it can convert from the
character array to byte arrays as appropriate.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message