lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Matthijs <li...@selckin.be>
Subject Re: Not getting matches for analyzers using CharMappingFilter with Lucene 4.1
Date Mon, 25 Feb 2013 11:24:59 GMT
On Mon, Feb 25, 2013 at 12:19 PM, Thomas Matthijs <lists@selckin.be> wrote:

> On Mon, Feb 25, 2013 at 11:30 AM, Thomas Matthijs <lists@selckin.be>wrote:
>
>>
>> On Mon, Feb 25, 2013 at 11:24 AM, Paul Taylor <paul_t100@fastmail.fm>wrote:
>>
>>> On 20/02/2013 11:28, Paul Taylor wrote:
>>>
>>>> Just updating codebase from Lucene 3.6 to Lucene 4.1 and seems my tests
>>>> that use NormalizeCharMap for replacing characters in the anyalzers are not
>>>> working.
>>>>
>>>>  bump, anybody I thought a self contained testcase would be enough to
>>> pique somebodys interest, am I doing something silly - maybe but I can't
>>> see it
>>
>>
>>
>> Tried to run your test but it uses  MusicbrainzTokenizer
>>
>
>
> Well i made it work, if it's a bug that this is required or if it
> documented anywhere i don't know, it does seem very trappy:
>


It is documented all the way at the bottom:
http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/analysis/package-summary.html

So it should be:

    class SimpleAnalyzer extends Analyzer {

        protected NormalizeCharMap charConvertMap;

        public SimpleAnalyzer() {
            NormalizeCharMap.Builder builder = new
NormalizeCharMap.Builder();
            builder.add("&", "and");
            charConvertMap = builder.build();
        }

        @Override
        protected TokenStreamComponents createComponents(String fieldName,
Reader reader) {
            Tokenizer source = new WhitespaceTokenizer(Version.LUCENE_40,
reader);
            TokenStream filter = new LowerCaseFilter(Version.LUCENE_40,
source);
            return new TokenStreamComponents(source, filter);
        }

        @Override
        protected Reader initReader(String fieldName, Reader reader) {
            return new MappingCharFilter(charConvertMap, reader);
        }
    }

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message