Return-Path: Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 26025 invoked from network); 8 Oct 2001 09:40:50 -0000 Received: from rlf2.demon.co.uk (194.222.38.235) by daedalus.apache.org with SMTP; 8 Oct 2001 09:40:50 -0000 Received: from murphy.granta.internal (murphy) [10.0.0.36] by rlf2.demon.co.uk with smtp (Exim 3.12 #1 (Debian)) id 15qWuE-0005ZT-00; Mon, 08 Oct 2001 10:41:02 +0100 Message-ID: <00ca01c14fdd$57ce2920$2400000a@trumpton.internal> From: "Lee Mallabone" To: References: <4BC270C6AB8AD411AD0B00B0D0493DF0EE7C2B@mail.grandcentral.com> Subject: Re: context and hit positions with Lucene Date: Mon, 8 Oct 2001 10:41:02 +0100 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4133.2400 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N Doug Cutting wrote: > Please see also Maik Schreiber's message on this topic: > > http://www.geocrawler.org/archives/3/2624/2001/9/50/6553088/ Great! Thanks, that's a real help. > The > index does not store the byte-position of words in the original document. Does that rule out the potential to implement proximity operators? I need to implement NEAR (and then SAME for paragraph searches), but I'm a novice in terms of search engine implementations. Am I likely to be out of my depth attempting that right now with Lucene? > Perhaps we should add a utility method such as: > > public static Set getHitTokens(Set queryTerms, Reader text, Analyzer a) ..snip.. > What class would we add this to? If we add it to Query then it could take a > Query instead of a Set. As Maik points out, there is currently no public > method that returns the set of terms in a query. That should probably be > added in any case. As you suggest, I think taking a Query rather than a Set would be the most convenient. This looks good, but what about the (future) case where you have complex (possibly nested) proximity searches and only want to highlight the relevant tokens when they appear near each other? My company really likes Lucene, but we have a customer with *very* stringent search requirements so I'm trying to determine if we can implement all of them with or on top of Lucene. Regards, Lee Mallabone.