lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Ingersoll (JIRA)" <j...@apache.org>
Subject [jira] Commented: (LUCENE-1058) New Analyzer for buffering tokens
Date Wed, 21 Nov 2007 02:21:43 GMT

    [ https://issues.apache.org/jira/browse/LUCENE-1058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12544145
] 

Grant Ingersoll commented on LUCENE-1058:
-----------------------------------------

Some javadoc comments for the modifyToken method in BufferingTokenFilter should be sufficient,
right?  Something to the effect that if this TokenFilter is not the last in the chain that
it should make a full copy.  

As for the CachedTokenizer and CachedAnalyzer, those should be implied, since the user is
passing them in to begin with.

The other thing of interest, is that calling Analyzer.tokenStream(String, Reader) is not needed.
 In fact, this somewhat suggests having a new Fieldable property akin to tokenStreamValue(),
etc. that says don't even ask the Fieldable for a value.  

Let me take a crack at what that means and post a patch.  It will mean some changes to invertField()
in DocumentsWriter and possibly changing it to not require that one of tokenStreamValue, readerValue()
or stringValue() be defined.  Not sure if that is a good idea or not.  



> New Analyzer for buffering tokens
> ---------------------------------
>
>                 Key: LUCENE-1058
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1058
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Analysis
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: LUCENE-1058.patch
>
>
> In some cases, it would be handy to have Analyzer/Tokenizer/TokenFilters that could siphon
off certain tokens and store them in a buffer to be used later in the processing pipeline.
> For example, if you want to have two fields, one lowercased and one not, but all the
other analysis is the same, then you could save off the tokens to be output for a different
field.
> Patch to follow, but I am still not sure about a couple of things, mostly how it plays
with the new reuse API.
> See http://www.gossamer-threads.com/lists/lucene/java-dev/54397?search_string=BufferingAnalyzer;#54397

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message