lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Matthijs <li...@selckin.be>
Subject Re: Not getting matches for analyzers using CharMappingFilter with Lucene 4.1
Date Mon, 25 Feb 2013 11:19:37 GMT
On Mon, Feb 25, 2013 at 11:30 AM, Thomas Matthijs <lists@selckin.be> wrote:

>
> On Mon, Feb 25, 2013 at 11:24 AM, Paul Taylor <paul_t100@fastmail.fm>wrote:
>
>> On 20/02/2013 11:28, Paul Taylor wrote:
>>
>>> Just updating codebase from Lucene 3.6 to Lucene 4.1 and seems my tests
>>> that use NormalizeCharMap for replacing characters in the anyalzers are not
>>> working.
>>>
>>>  bump, anybody I thought a self contained testcase would be enough to
>> pique somebodys interest, am I doing something silly - maybe but I can't
>> see it
>
>
>
> Tried to run your test but it uses  MusicbrainzTokenizer
>


Well i made it work, if it's a bug that this is required or if it
documented anywhere i don't know, it does seem very trappy:

    class SimpleAnalyzer extends Analyzer {

        protected NormalizeCharMap charConvertMap;

        public SimpleAnalyzer() {
            NormalizeCharMap.Builder builder = new
NormalizeCharMap.Builder();
            builder.add("&", "and");
            charConvertMap = builder.build();
        }

        @Override
        protected TokenStreamComponents createComponents(String fieldName,
Reader reader) {
            Tokenizer source = new WhitespaceTokenizer(Version.LUCENE_40,
new MappingCharFilter(charConvertMap, reader));
            TokenStream filter = new LowerCaseFilter(Version.LUCENE_40,
source);
            return new TokenStreamComponents(source, filter) {
                @Override
                protected void setReader(Reader reader) throws IOException {
                    super.setReader(new MappingCharFilter(charConvertMap,
reader));
                }
            };
        }
    }

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message