lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <>
Subject [jira] Updated: (LUCENE-1058) New Analyzer for buffering tokens
Date Tue, 27 Nov 2007 01:57:43 GMT


Grant Ingersoll updated LUCENE-1058:

    Attachment: LUCENE-1058.patch

A new version of this with the following changes/additions:

DocumentsWriter no longer requires that a Field have a value (i.e. stringValue, etc.)  Added
a new Field constructor that allows for the construction of a Field without a value.  This
would allow for Analyzer implementations that produce their own tokens (whatever that means)

Moved CollaboratingAnalyzer, et. al to the core under analysis.buffered as I thought these
items should be in core given the changes to Field and DocsWriter.

Note, I think this is a subtle, but important change in DocumentsWriter/Field behavior.

> New Analyzer for buffering tokens
> ---------------------------------
>                 Key: LUCENE-1058
>                 URL:
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: LUCENE-1058.patch, LUCENE-1058.patch, LUCENE-1058.patch
> In some cases, it would be handy to have Analyzer/Tokenizer/TokenFilters that could siphon
off certain tokens and store them in a buffer to be used later in the processing pipeline.
> For example, if you want to have two fields, one lowercased and one not, but all the
other analysis is the same, then you could save off the tokens to be output for a different
> Patch to follow, but I am still not sure about a couple of things, mostly how it plays
with the new reuse API.
> See;#54397

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message