lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrew Boyd <andrew.b...@mindspring.com>
Subject Re: How to get the un-stemed word
Date Tue, 12 Jul 2005 16:07:30 GMT
Thats why (at least one of the reasons) I wish the token type was stored in the index.

-----Original Message-----
From: markharw00d <markharw00d@yahoo.co.uk>
Sent: Jul 11, 2005 4:08 PM
To: java-user@lucene.apache.org
Subject: Re: How to get the un-stemed word

>>Would that show up in the TermVectors?

Yes, but uou would need a scheme for identifying "original, unstemmed" terms vs stems. For
example, you could use another field and analyzer for the unstemmed forms.


Andrew Boyd wrote:

>What about storing the unstemed word with the same position as the stemmed word.  Would
that show up in the TermVectors?
>
>-----Original Message-----
>From: mark harwood <markharw00d@yahoo.co.uk>
>Sent: Jul 8, 2005 10:44 AM
>To: java-user@lucene.apache.org, Andrew Boyd <Andrew.Boyd@bbtech.net>
>Subject: Re: How to get the un-stemed word
>
>You can get the unstemmed word by re-analysing the
>(hopefully stored somewhere) text.
>Look at the tokens emitted from the TokenStream and
>when you get to the one that matches the stemmed form
>you can use the token offset info to retrieve the
>unstemmed form from the original text. 
>
>Another option which avoids re-analysis is to store
>the TermVector with TermPositionVector info enabled.
>All the offsets are then stored in the index, rather
>than computed on-the-fly by an Analyzer.
>
>The highlighter in the sandbox can use both of these
>approaches to get the original forms.
>
>Cheers
>Mark
>
>
>	
>	
>		
>___________________________________________________________ 
>Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>Andrew Boyd
>Software Architect
>Sun Certified J2EE Architect
>B&B Technical Services Inc.
>205.422.2557
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>  
>


		
___________________________________________________________ 
How much free photo storage do you get? Store your holiday 
snaps for FREE with Yahoo! Photos http://uk.photos.yahoo.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message