lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <>
Subject [jira] Updated: (LUCENE-1676) New Token filter for adding payloads "in-stream"
Date Tue, 02 Jun 2009 22:36:07 GMT


Grant Ingersoll updated LUCENE-1676:

    Attachment: LUCENE-1676.patch

Here's a first draft of this.  See the test case for an example.

> New Token filter for adding payloads "in-stream"
> ------------------------------------------------
>                 Key: LUCENE-1676
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/analyzers
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 2.9
>         Attachments: LUCENE-1676.patch
> This TokenFilter is able to split a token based on a delimiter and use one part as the
token and the other part as a payload.  This allows someone to include payloads inline with
tokens (presumably setup by a pipeline ahead of time).  An example is apropos.  Given a |
delimiter, we could have a stream that looks like:
> {quote}The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ dogs|NN{quote}
> In this case, this would produce tokens and payloads (assuming whitespace tokenization):
> Token: the
> Payload: null
> Token: quick
> Payload: JJ
> Token: red
> Pay: JJ.
> and so on.
> This patch will also support pluggable encoders for the payloads, so it can convert from
the character array to byte arrays as appropriate.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message