lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Taylor <paul_t...@fastmail.fm>
Subject Re: Not getting matches for analyzers using CharMappingFilter with Lucene 4.1
Date Tue, 26 Feb 2013 10:03:34 GMT
On 25/02/2013 11:24, Thomas Matthijs wrote:
> On Mon, Feb 25, 2013 at 12:19 PM, Thomas Matthijs <lists@selckin.be 
> <mailto:lists@selckin.be>> wrote:
>
>     On Mon, Feb 25, 2013 at 11:30 AM, Thomas Matthijs
>     <lists@selckin.be <mailto:lists@selckin.be>> wrote:
>
>
>         On Mon, Feb 25, 2013 at 11:24 AM, Paul Taylor
>         <paul_t100@fastmail.fm <mailto:paul_t100@fastmail.fm>> wrote:
>
>             On 20/02/2013 11:28, Paul Taylor wrote:
>
>                 Just updating codebase from Lucene 3.6 to Lucene 4.1
>                 and seems my tests that use NormalizeCharMap for
>                 replacing characters in the anyalzers are not working.
>
>             bump, anybody I thought a self contained testcase would be
>             enough to pique somebodys interest, am I doing something
>             silly - maybe but I can't see it
>
>
>
>         Tried to run your test but it uses  MusicbrainzTokenizer
>
>
>
>     Well i made it work, if it's a bug that this is required or if it
>     documented anywhere i don't know, it does seem very trappy:
>
>
>
> It is documented all the way at the bottom: 
> http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/analysis/package-summary.html
>
> So it should be:
>
>     class SimpleAnalyzer extends Analyzer {
>
>         protected NormalizeCharMap charConvertMap;
>
>         public SimpleAnalyzer() {
>             NormalizeCharMap.Builder builder = new 
> NormalizeCharMap.Builder();
>             builder.add("&", "and");
>             charConvertMap = builder.build();
>         }
>
>         @Override
>         protected TokenStreamComponents createComponents(String 
> fieldName, Reader reader) {
>             Tokenizer source = new 
> WhitespaceTokenizer(Version.LUCENE_40, reader);
>             TokenStream filter = new 
> LowerCaseFilter(Version.LUCENE_40, source);
>             return new TokenStreamComponents(source, filter);
>         }
>
>         @Override
>         protected Reader initReader(String fieldName, Reader reader) {
>             return new MappingCharFilter(charConvertMap, reader);
>         }
>     }
>
Thanks Thomas, for some reason didnt see your post until now and 
independently worked it out.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message