lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1676) New Token filter for adding payloads "in-stream"
Date Thu, 11 Jun 2009 12:51:07 GMT


Michael McCandless commented on LUCENE-1676:

Shouldn't the CHANGES entry in this patch go into contrib/CHANGES?

> New Token filter for adding payloads "in-stream"
> ------------------------------------------------
>                 Key: LUCENE-1676
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/analyzers
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 2.9
>         Attachments: LUCENE-1676.patch
> This TokenFilter is able to split a token based on a delimiter and use one part as the
token and the other part as a payload.  This allows someone to include payloads inline with
tokens (presumably setup by a pipeline ahead of time).  An example is apropos.  Given a |
delimiter, we could have a stream that looks like:
> {quote}The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ dogs|NN{quote}
> In this case, this would produce tokens and payloads (assuming whitespace tokenization):
> Token: the
> Payload: null
> Token: quick
> Payload: JJ
> Token: red
> Pay: JJ.
> and so on.
> This patch will also support pluggable encoders for the payloads, so it can convert from
the character array to byte arrays as appropriate.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message