lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jm <jmugur...@gmail.com>
Subject analyzer not working properly when indexing
Date Tue, 20 Apr 2010 15:58:34 GMT
I am encountering a strange issue. I have a CustomStopAnalyzer. If I
do this (supporting code taken from AnalyzerUtils in LIA3 source code
Mike uploaded):
        Analyzer customStopAnalyzer = new CustomStopAnalyzer();
        AnalyzerUtils.displayTokensWithFullDetails(customStopAnalyzer,
"mail77");

I get what I expect:		
1: [mail77:0->6:word]

But when I am actually indexing docs, the word containing numbers
loose the numbers.
	directory = new RAMDirectory();
        writer = new IndexWriter(directory, customStopAnalyzer,
IndexWriter.MaxFieldLength.UNLIMITED);
        doc = new Document();
        doc.add((Fieldable) new Field("contents", "mail77",
Field.Store.NO, Field.Index.ANALYZED));
        writer.addDocument(doc);
        writer.close();
        hitCount = getHitCount(directory, "contents", "mail77");
        System.out.println("mail77 " + hitCount);

This writes
mail77 0
If I look for "mail", I get one hit...I am using Lucene 3.0.1. Where
should I start looking (I assume in CustomStopAnalyzer but the fact
that displayTokensWithFullDetails() shows the right output puzzles
me)??

thanks
javier

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message