lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jm <jmugur...@gmail.com>
Subject analizer not doing the same thing at index and query time?
Date Mon, 11 Jul 2011 15:19:18 GMT
*Hi,*
*
*
*My env is jdk1.6 and lucene3.3.*
*
*
*At index time I have this:*
*
*
*
        Directory directory = FSDirectory.open(new
File("d:\\temp\\lucene.index"));
        IndexWriter writer = new IndexWriter(directory, myAnalizer,
IndexWriter.MaxFieldLength.UNLIMITED);
        Document doc = new Document();
        doc.add((Fieldable) new Field("bbbb", "bloom's bird", Field.Store.NO,
Field.Index.ANALYZED));
         writer.addDocument(doc);
        // add another doc
**
        writer.close(); // 3
*
*

*
*
And I know only 'bloom' and 'bird' are indexed (I verify with luke). My
analyzer removes all non-alphanumeric chars.
*
*

*
*
At query time I do this:
*
*

*
*
        QueryParser
**
qp
**
= new QueryParser(LUCENEVERSIONCOMPAT, FBODY,
**
myAnalizer
**
);
*
*
        printHitCountQP(directory, qp, "bbbb:(*bloom's*)");
        printHitCountQP(directory, qp, "bbbb:(*bloom)");
        printHitCountQP(directory, qp, "bbbb:(*bloom AND b*)");

    protected static void printHitCountQP(Directory directory, QueryParser
qp, String searchString) throws IOException, ParseException {
        IndexSearcher searcher = new IndexSearcher(directory, true); //5
        Query query = qp.parse(searchString);
        int hitCount = searcher.search(query, 1).totalHits;
        searcher.close();
        System.out.println(searchString + " got " + hitCount + " Query is: "
+ query.toString());
    }

And I get this:

bbbb:(*bloom's*) got 0 Query is: bbbb:*bloom's*
bbbb:(*bloom) got 2 Query is: bbbb:*bloom
bbbb:(*bloom AND b*) got 1 Query is: +bbbb:*bloom +bbbb:b*

Queries 2 and 3 are ok, but I don't understand the first case, shouldn't
have it removed the ' as I am using the same analyzer than I did at index
time??

thanks
*

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message