lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Erick Erickson" <erickerick...@gmail.com>
Subject Re: Problem with PorterStemFilter
Date Mon, 08 Dec 2008 14:04:07 GMT
your output says you couldn't find "ugli", but you indexed "ugly". I
assume that's just a typo, and the stemmer probably makes it moot
anyway....

I don't see anything obvious in the code, but here's what I'd suggest...

1> write this out to a FSDir rather than a RAMDir, get a copy of Luke
    (google "lucene luke") and examine what's actually in your index.
2> query.toString is your friend to find out how the query is actually
     passed to the searcher.
3> You are explicitly doing phrase queries by quoting the string, but
      that should be OK.

FWIW
Erick

On Sun, Dec 7, 2008 at 1:26 PM, Preetam Rao <blogathan.rao@gmail.com> wrote:

> Hi,
>
> I am indexing three words in a document.
> Then I run a phrase query on that document searching for two words at a
> time
> and three words at a time.
> I use PorterStemFilter for both searching and indexing. I am getting very
> inconsistent results. Am I doing something incorrectly ?
> The way I use PorterStemmer is by overriding tokenStream() method of
> StandardAnalyzer and adding PorterStemFiler to the chain.
> If I use StandardAnalyzer everything works fine. I am suspecting the way I
> am creating the analyzer.
> I printed position increments, offsets etc for both cases and did not see
> any difference.
>
> Below are the tests I am running and the full code.
>
> tests:
> Indexed content : "one two three"  search : "one two" no documents found
> Indexed content : "one two three"  search : "one two three" no documents
> found
>
> Indexed content : "first second third"  search : "first second" one
> documents found
> Indexed content : "first second third"  search :"first second third" one
> documents found
>
> Indexed content : "good bad ugly"  search : "good bad" one documents found
> Indexed content : "good bad ugly"  search :"good bad ugly" no documents
> found
>
>
> The below is the code:
>
> import org.apache.lucene.store.RAMDirectory;
> import org.apache.lucene.index.IndexWriter;
> import org.apache.lucene.analysis.TokenStream;
> import org.apache.lucene.analysis.PorterStemFilter;
> import org.apache.lucene.analysis.standard.StandardAnalyzer;
> import org.apache.lucene.document.Document;
> import org.apache.lucene.document.Field;
> import org.apache.lucene.search.IndexSearcher;
> import org.apache.lucene.search.Query;
> import org.apache.lucene.search.Hits;
> import org.apache.lucene.queryParser.QueryParser;
> import org.apache.lucene.queryParser.ParseException;
> import java.io.Reader;
> import java.io.IOException;
>
>
> public class TestPorterStemmer {
>    public static void main(String[] args) throws IOException,
> ParseException {
>        RAMDirectory index = new RAMDirectory();
>        IndexWriter writer = new IndexWriter(index, getAnalyzer(), true,
> IndexWriter.MaxFieldLength.UNLIMITED);
>        Document doc = new Document();
>        doc.add(new Field("content", "good bad ugly", Field.Store.YES,
> Field.Index.ANALYZED));
>        writer.addDocument(doc);
>        writer.optimize();
>        writer.close();
>        IndexSearcher searcher = new IndexSearcher(index);
>        QueryParser parser = new QueryParser("content", getAnalyzer());
>
>        Query query = parser.parse("\"" + "good bad" + "\"");
>        Hits hits = searcher.search(query);
>        System.out.println("searched for " + query.toString() + " matched :
> " + hits.length() + " documents ");
>
>        query = parser.parse("\"" + "good bad ugly" + "\"");
>        hits = searcher.search(query);
>        System.out.println("searched for " + query.toString() + " matched :
> " + hits.length() + " documents ");
>    }
>
>    public static StandardAnalyzer getAnalyzer() {
>        return new StandardAnalyzer() {
>            public TokenStream tokenStream(String fieldName, Reader reader)
> {
>                TokenStream result = super.tokenStream(fieldName, reader);
>                return new PorterStemFilter(result);
>            }
>        };
>    }
> }
>
> output:
> searched for content:"good bad" matched : 1 documents
> searched for content:"good bad ugli" matched : 0 documents
>
> Any help is greatly appreciated...
>
> Thanks
> Preetam
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message