lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michael McCandless (JIRA)" <>
Subject [jira] Commented: (LUCENE-1676) New Token filter for adding payloads "in-stream"
Date Thu, 11 Jun 2009 20:16:09 GMT


Michael McCandless commented on LUCENE-1676:

I agree we should decide.

I would lean towards always using contrib/CHANGES.  And then we should double-check all core
CHANGES entries in 2.9 and move them to contrib if needed.

> New Token filter for adding payloads "in-stream"
> ------------------------------------------------
>                 Key: LUCENE-1676
>                 URL:
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: contrib/analyzers
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>             Fix For: 2.9
>         Attachments: LUCENE-1676.patch
> This TokenFilter is able to split a token based on a delimiter and use one part as the
token and the other part as a payload.  This allows someone to include payloads inline with
tokens (presumably setup by a pipeline ahead of time).  An example is apropos.  Given a |
delimiter, we could have a stream that looks like:
> {quote}The quick|JJ red|JJ fox|NN jumped|VB over the lazy|JJ brown|JJ dogs|NN{quote}
> In this case, this would produce tokens and payloads (assuming whitespace tokenization):
> Token: the
> Payload: null
> Token: quick
> Payload: JJ
> Token: red
> Pay: JJ.
> and so on.
> This patch will also support pluggable encoders for the payloads, so it can convert from
the character array to byte arrays as appropriate.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message