Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 50444 invoked from network); 11 Jul 2005 20:02:02 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 11 Jul 2005 20:02:02 -0000 Received: (qmail 66493 invoked by uid 500); 11 Jul 2005 20:01:55 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 66474 invoked by uid 500); 11 Jul 2005 20:01:55 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 66459 invoked by uid 99); 11 Jul 2005 20:01:55 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jul 2005 13:01:55 -0700 X-ASF-Spam-Status: No, hits=0.4 required=10.0 tests=DNS_FROM_RFC_ABUSE X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [207.69.200.28] (HELO pop04.mail.atl.earthlink.net) (207.69.200.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Jul 2005 13:01:52 -0700 Received: from wamui-chipeau.atl.sa.earthlink.net ([209.86.224.30]) by pop04.mail.atl.earthlink.net with esmtp (Exim 3.36 #10) id 1Ds4TQ-0003ik-00 for java-user@lucene.apache.org; Mon, 11 Jul 2005 16:01:52 -0400 Message-ID: <1215225.1121112112259.JavaMail.root@wamui-chipeau.atl.sa.earthlink.net> Date: Mon, 11 Jul 2005 16:01:52 -0400 (EDT) From: Andrew Boyd Reply-To: Andrew Boyd To: java-user@lucene.apache.org Subject: Re: How to get the un-stemed word Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Mailer: Earthlink Zoo Mail 1.0 X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N What about storing the unstemed word with the same position as the stemmed word. Would that show up in the TermVectors? -----Original Message----- From: mark harwood Sent: Jul 8, 2005 10:44 AM To: java-user@lucene.apache.org, Andrew Boyd Subject: Re: How to get the un-stemed word You can get the unstemmed word by re-analysing the (hopefully stored somewhere) text. Look at the tokens emitted from the TokenStream and when you get to the one that matches the stemmed form you can use the token offset info to retrieve the unstemmed form from the original text. Another option which avoids re-analysis is to store the TermVector with TermPositionVector info enabled. All the offsets are then stored in the index, rather than computed on-the-fly by an Analyzer. The highlighter in the sandbox can use both of these approaches to get the original forms. Cheers Mark ___________________________________________________________ Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org Andrew Boyd Software Architect Sun Certified J2EE Architect B&B Technical Services Inc. 205.422.2557 --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org