lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ian Lea <ian....@gmail.com>
Subject Re: analizer not doing the same thing at index and query time?
Date Tue, 12 Jul 2011 11:01:32 GMT
If I've read your example correctly it appears that at indexing your
analyzer is converting "bloom's" to "bloom" but not at search time.
Which implies that you aren't using the same analyzer in both cases.


--
Ian.


On Mon, Jul 11, 2011 at 4:19 PM, jm <jmuguruza@gmail.com> wrote:
> *Hi,*
> *
> *
> *My env is jdk1.6 and lucene3.3.*
> *
> *
> *At index time I have this:*
> *
> *
> *
>        Directory directory = FSDirectory.open(new
> File("d:\\temp\\lucene.index"));
>        IndexWriter writer = new IndexWriter(directory, myAnalizer,
> IndexWriter.MaxFieldLength.UNLIMITED);
>        Document doc = new Document();
>        doc.add((Fieldable) new Field("bbbb", "bloom's bird", Field.Store.NO,
> Field.Index.ANALYZED));
>         writer.addDocument(doc);
>        // add another doc
> **
>        writer.close(); // 3
> *
> *
>
> *
> *
> And I know only 'bloom' and 'bird' are indexed (I verify with luke). My
> analyzer removes all non-alphanumeric chars.
> *
> *
>
> *
> *
> At query time I do this:
> *
> *
>
> *
> *
>        QueryParser
> **
> qp
> **
> = new QueryParser(LUCENEVERSIONCOMPAT, FBODY,
> **
> myAnalizer
> **
> );
> *
> *
>        printHitCountQP(directory, qp, "bbbb:(*bloom's*)");
>        printHitCountQP(directory, qp, "bbbb:(*bloom)");
>        printHitCountQP(directory, qp, "bbbb:(*bloom AND b*)");
>
>    protected static void printHitCountQP(Directory directory, QueryParser
> qp, String searchString) throws IOException, ParseException {
>        IndexSearcher searcher = new IndexSearcher(directory, true); //5
>        Query query = qp.parse(searchString);
>        int hitCount = searcher.search(query, 1).totalHits;
>        searcher.close();
>        System.out.println(searchString + " got " + hitCount + " Query is: "
> + query.toString());
>    }
>
> And I get this:
>
> bbbb:(*bloom's*) got 0 Query is: bbbb:*bloom's*
> bbbb:(*bloom) got 2 Query is: bbbb:*bloom
> bbbb:(*bloom AND b*) got 1 Query is: +bbbb:*bloom +bbbb:b*
>
> Queries 2 and 3 are ok, but I don't understand the first case, shouldn't
> have it removed the ' as I am using the same analyzer than I did at index
> time??
>
> thanks
> *
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message