Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 40290 invoked from network); 27 Feb 2009 15:17:41 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 27 Feb 2009 15:17:41 -0000 Received: (qmail 40103 invoked by uid 500); 27 Feb 2009 15:17:39 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 39799 invoked by uid 500); 27 Feb 2009 15:17:39 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 39790 invoked by uid 99); 27 Feb 2009 15:17:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Feb 2009 07:17:38 -0800 X-ASF-Spam-Status: No, hits=2.6 required=10.0 tests=DNS_FROM_OPENWHOIS,SPF_HELO_PASS,SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of lists@nabble.com designates 216.139.236.158 as permitted sender) Received: from [216.139.236.158] (HELO kuber.nabble.com) (216.139.236.158) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Feb 2009 15:17:31 +0000 Received: from isper.nabble.com ([192.168.236.156]) by kuber.nabble.com with esmtp (Exim 4.63) (envelope-from ) id 1Ld4Sc-0002v3-Ta for java-dev@lucene.apache.org; Fri, 27 Feb 2009 07:17:10 -0800 Message-ID: <22247863.post@talk.nabble.com> Date: Fri, 27 Feb 2009 07:17:10 -0800 (PST) From: HPDrifter To: java-dev@lucene.apache.org Subject: Re: Getting tokens from search results. Simple concept In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Nabble-From: dustin.lyday@exobox.com References: <22225364.post@talk.nabble.com> X-Virus-Checked: Checked by ClamAV on apache.org Yes, I have but it is too memory intensive. I used highlighter as my first attempt but it was not a good solution because, I have to send the entire text to highlighter. What I did instead is similar to your suggestion. 1. use the analyzer to return me a token stream. 2. search the token stream for the keyword I'm looking for (need to analyze that keyword as well!) 3. extract the token's offset. 4. use the offsets in the index and Java's RandomFileArray to "seek" the byte(character) position then extract a "fragment" of about 500 chars around that index. This solution requires little memory use and, I hope, will work as I expect under steady stress. How does this sound to you? What I would LOVE is if I could do it in a standard Lucene search like I mentioned earlier. Hit.doc[0].getHitTokenList() :confused: Something like this... ~Dustin Erik Hatcher wrote: > > Have you looked at the contrib Highlighter? Or using an Analyzer > directly to give you the offsets? > > Erik > > On Feb 26, 2009, at 9:32 AM, HPDrifter wrote: > >> >> When I get a search result based on my index, I need the exact >> tokens which >> were identified in the index as part of the result. Why? I need the >> character offsets. >> >> I have a solution right now...almost, but it bugs the hell out of me >> that I >> can say something like... >> documentHit[0].getIdentifiedTokens(); >> >> Do I need to make a contribution in order to make this happen?:ninja: >> >> >> -- >> View this message in context: >> http://www.nabble.com/Getting-tokens-from-search-results.--Simple-concept-tp22225364p22225364.html >> Sent from the Lucene - Java Developer mailing list archive at >> Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-dev-help@lucene.apache.org > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > > -- View this message in context: http://www.nabble.com/Getting-tokens-from-search-results.--Simple-concept-tp22225364p22247863.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org