Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm
Message-ID: <00ca01c14fdd$57ce2920$2400000a@trumpton.internal>
From: "Lee Mallabone" <lee@grantadesign.com>
To: <lucene-dev@jakarta.apache.org>
References: <4BC270C6AB8AD411AD0B00B0D0493DF0EE7C2B@mail.grandcentral.com>
Subject: Re: context and hit positions with Lucene
Date: Mon, 8 Oct 2001 10:41:02 +0100
MIME-Version: 1.0
Content-Type: text/plain;
	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

Doug Cutting wrote:
> Please see also Maik Schreiber's message on this topic:
>
>   http://www.geocrawler.org/archives/3/2624/2001/9/50/6553088/

Great! Thanks, that's a real help.

> The
> index does not store the byte-position of words in the original document.

Does that rule out the potential to implement proximity operators? I need to
implement NEAR (and then SAME for paragraph searches), but I'm a novice in
terms of search engine implementations. Am I likely to be out of my depth
attempting that right now with Lucene?

> Perhaps we should add a utility method such as:
>
>   public static Set getHitTokens(Set queryTerms, Reader text, Analyzer a)
..snip..
> What class would we add this to?  If we add it to Query then it could take
a
> Query instead of a Set.  As Maik points out, there is currently no public
> method that returns the set of terms in a query.  That should probably be
> added in any case.

As you suggest, I think taking a Query rather than a Set would be the most
convenient.

This looks good, but what about the (future) case where you have complex
(possibly nested) proximity searches and only want to highlight the relevant
tokens when they appear near each other?

My company really likes Lucene, but we have a customer with *very* stringent
search requirements so I'm trying to determine if we can implement all of
them with or on top of Lucene.

Regards,

Lee Mallabone.