lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Problem searching Field.Keyword field
Date Wed, 09 Feb 2005 11:56:30 GMT
The only caveat to your VerbatimAnalyzer is that it will still split 
strings that are over 255 characters.  CharTokenizer does that.  
Granted, though, that keyword fields probably don't make much sense to 
be that long.

As mentioned yesterday - I added the LIA KeywordAnalyzer into the 
contrib area of Subversion.  I had built one like you had also, but the 
one I contributed reads the entire input stream into a StringBuffer 
ensuring it does not get split like CharTokenizer would.

	Erik


On Feb 9, 2005, at 4:40 AM, Miles Barr wrote:

> On Tue, 2005-02-08 at 12:19 -0500, Steven Rowe wrote:
>> Why is there no KeywordAnalyzer?  That is, an analyzer which doesn't
>> mess with its input in any way, but just returns it as-is?
>>
>> I realize that under most circumstances, it would probably be more 
>> code
>> to use it than just constructing a TermQuery, but having it would
>> regularize query handling, and simplify new users' experience.  And 
>> for
>> the purposes of the PerFieldAnalyzerWrapper, it could be helpful.
>
> It's fairly straightforward to write one. Here's the one I put together
> for PerFieldAnalyzerWrapper situations:
>
>
> package org.apache.lucene.analysis;
>
> import java.io.Reader;
>
> public class VerbatimAnalyzer extends Analyzer {
>
>     public VerbatimAnalyzer() {
>         super();
>     }
>
>     public TokenStream tokenStream(String fieldName, Reader reader) {
>         TokenStream result = new VerbatimTokenizer(reader);
>
>         return result;
>     }
>
>
>     /**
>      * This tokenizer assumes that the entire input is just one token.
>      */
>     public static class VerbatimTokenizer extends CharTokenizer {
>
>         public VerbatimTokenizer(Reader reader) {
>             super(reader);
>         }
>
>         protected boolean isTokenChar(char c) {
>             return true;
>         }
>     }
> }
>
>
> -- 
> Miles Barr <miles@runtime-collective.com>
> Runtime Collective Ltd.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message