lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Haycock" <Robert.Hayc...@mediasurface.com>
Subject Adding stem AND original term
Date Wed, 28 Jun 2006 12:52:09 GMT
Hi,

I started using the EnglishStemmer and noticed that only the stem gets
added to the index.  I would like to be able to add both to give me a
stem search and an exact search capability.

My first attempt has been to write my own stemming filter.  The idea
being that the first pass would get the next token and return it then
the second pass would retrieve the same token and stem it.  I wasn't
sure if setPositionIncrement() was the right tool for the job as I don't
really know what it does!!  I was assuming setting it to 0 would cause
the token to come around again.  Problem I get with this is this
exception:

java.lang.NullPointerException
	at
org.apache.lucene.analysis.TokenFilter.close(TokenFilter.java:41)
	at
org.apache.lucene.analysis.TokenFilter.close(TokenFilter.java:41)
	at
org.apache.lucene.index.DocumentWriter.invertDocument(DocumentWriter.jav
a:185)
	at
org.apache.lucene.index.DocumentWriter.addDocument(DocumentWriter.java:9
3)
	at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:450)
	at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:436)
	at
com.mediasurface.search.RobsTest$WriteThread.run(RobsTest.java:126)


If someone could let me know If I'm going off at a complete tangent.

Here's my edited next method.

public Token next()
{
  try
  {
    if (unStemmedToken == null)
    {
      if ((unStemmedToken = in.next()) == null)
      {
        return null;
      }
      unStemmedToken.setPositionIncrement(0);
      return unStemmedToken;
    }
    else
    {
      // Stem the text
      ...

      Token stemmedToken = new Token(stem, startOffset, endOffset);
      stemmedToken.setPositionIncrement(1);

      unStemmedToken = null;
      return stemmedToken;
    }
  }
  catch (Exception e)
  {
    e.printStackTrace();
    return null;
  }
}


Thanks, 

Rob.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message