Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (asf.osuosl.org: local policy)
Message-ID: 
 <1215225.1121112112259.JavaMail.root@wamui-chipeau.atl.sa.earthlink.net>
Date: Mon, 11 Jul 2005 16:01:52 -0400 (EDT)
From: Andrew Boyd <andrew.boyd@mindspring.com>
Reply-To: Andrew Boyd <Andrew.Boyd@bbtech.net>
To: java-user@lucene.apache.org
Subject: Re: How to get the un-stemed word
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

What about storing the unstemed word with the same position as the stemmed word.  Would that show up in the TermVectors?

-----Original Message-----
From: mark harwood <markharw00d@yahoo.co.uk>
Sent: Jul 8, 2005 10:44 AM
To: java-user@lucene.apache.org, Andrew Boyd <Andrew.Boyd@bbtech.net>
Subject: Re: How to get the un-stemed word

You can get the unstemmed word by re-analysing the
(hopefully stored somewhere) text.
Look at the tokens emitted from the TokenStream and
when you get to the one that matches the stemmed form
you can use the token offset info to retrieve the
unstemmed form from the original text. 

Another option which avoids re-analysis is to store
the TermVector with TermPositionVector info enabled.
All the offsets are then stored in the index, rather
than computed on-the-fly by an Analyzer.

The highlighter in the sandbox can use both of these
approaches to get the original forms.

Cheers
Mark


___________________________________________________________ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Andrew Boyd
Software Architect
Sun Certified J2EE Architect
B&B Technical Services Inc.
205.422.2557

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org