lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Haycock" <Robert.Hayc...@mediasurface.com>
Subject RE: Adding stem AND original term
Date Wed, 28 Jun 2006 14:31:58 GMT
Stupid me.  It was working fine.  I hadn't called super(in) so the call
to stream.close() in DocumentWriter was obviously failing!!!

Rob.

-----Original Message-----
From: Robert Haycock [mailto:Robert.Haycock@mediasurface.com] 
Sent: 28 June 2006 15:04
To: java-user@lucene.apache.org
Subject: RE: Adding stem AND original term

Hi Erik,

Isn't buffering what I'm doing?  The first time next() is called it
reads the next token from the stream into 'unStemmedToken'.  The next
call uses the same token, and nulls it after use.  Third call will get
the next one from the stream and so on.  I effectively have a 'one token
buffer' which gets filled then emptied each call to next().

Rob.

-----Original Message-----
From: Erik Hatcher [mailto:erik@ehatchersolutions.com] 
Sent: 28 June 2006 14:44
To: java-user@lucene.apache.org
Subject: Re: Adding stem AND original term

Returning null is reserved for the end of the tokens.   You'll need  
to implement some kind of buffering mechanism - check out the custom  
analyzers (like the SynonymAnalyzer) in the Lucene in Action code -  
http://www.lucenebook.com - for examples.

	Erik


On Jun 28, 2006, at 8:52 AM, Robert Haycock wrote:

> Hi,
>
> I started using the EnglishStemmer and noticed that only the stem gets
> added to the index.  I would like to be able to add both to give me a
> stem search and an exact search capability.
>
> My first attempt has been to write my own stemming filter.  The idea
> being that the first pass would get the next token and return it then
> the second pass would retrieve the same token and stem it.  I wasn't
> sure if setPositionIncrement() was the right tool for the job as I  
> don't
> really know what it does!!  I was assuming setting it to 0 would cause
> the token to come around again.  Problem I get with this is this
> exception:
>
> java.lang.NullPointerException
> 	at
> org.apache.lucene.analysis.TokenFilter.close(TokenFilter.java:41)
> 	at
> org.apache.lucene.analysis.TokenFilter.close(TokenFilter.java:41)
> 	at
> org.apache.lucene.index.DocumentWriter.invertDocument 
> (DocumentWriter.jav
> a:185)
> 	at
> org.apache.lucene.index.DocumentWriter.addDocument 
> (DocumentWriter.java:9
> 3)
> 	at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:450)
> 	at
> org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:436)
> 	at
> com.mediasurface.search.RobsTest$WriteThread.run(RobsTest.java:126)
>
>
> If someone could let me know If I'm going off at a complete tangent.
>
> Here's my edited next method.
>
> public Token next()
> {
>   try
>   {
>     if (unStemmedToken == null)
>     {
>       if ((unStemmedToken = in.next()) == null)
>       {
>         return null;
>       }
>       unStemmedToken.setPositionIncrement(0);
>       return unStemmedToken;
>     }
>     else
>     {
>       // Stem the text
>       ...
>
>       Token stemmedToken = new Token(stem, startOffset, endOffset);
>       stemmedToken.setPositionIncrement(1);
>
>       unStemmedToken = null;
>       return stemmedToken;
>     }
>   }
>   catch (Exception e)
>   {
>     e.printStackTrace();
>     return null;
>   }
> }
>
>
> Thanks,
>
> Rob.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message