lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <>
Subject [jira] Updated: (LUCENE-1058) New Analyzer for buffering tokens
Date Mon, 19 Nov 2007 13:54:43 GMT


Grant Ingersoll updated LUCENE-1058:

    Attachment: LUCENE-1058.patch

First draft at a patch, provides two different approaches:

1.  CachedAnalyzer and CachedTokenizer take in a list of Tokens and output them as appropriate.
 Similar to CachingTokenFilter, but assumes you already have the Tokens

2. In contrib/analyzers/buffered, add CollaboratingAnalyzer and related classes for creating
a Analyzer, etc. that work in the stream.

Still not sure if and how this plays with the Token reuse (I think it doesn't)

> New Analyzer for buffering tokens
> ---------------------------------
>                 Key: LUCENE-1058
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Grant Ingersoll
>            Priority: Minor
>         Attachments: LUCENE-1058.patch
> In some cases, it would be handy to have Analyzer/Tokenizer/TokenFilters that could siphon
off certain tokens and store them in a buffer to be used later in the processing pipeline.
> For example, if you want to have two fields, one lowercased and one not, but all the
other analysis is the same, then you could save off the tokens to be output for a different
> Patch to follow, but I am still not sure about a couple of things, mostly how it plays
with the new reuse API.
> See;#54397

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message