Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm
Precedence: bulk
Reply-To: java-user@lucene.apache.org
Received-SPF: pass (athena.apache.org: domain of peterlkeegan@gmail.com
 designates 209.85.214.176 as permitted sender)
MIME-Version: 1.0
Sender: pkeegan01451@gmail.com
In-Reply-To: <7da891a71ce0918841a29b330a68a832@localhost>
References: 
 <CAN8y9rR43XurPBCXKcofDhTzx+G0=6-dG2jna9y4jEz_onzbMw@mail.gmail.com>
	<7da891a71ce0918841a29b330a68a832@localhost>
Date: Wed, 20 Jul 2011 12:27:35 -0400
Message-ID: 
 <CAN8y9rTUtiVqgPua_n3PBSRi9+oCFaYFnBZv8TXokVnZg677_A@mail.gmail.com>
Subject: Re: Search within a sentence (revisited)
From: Peter Keegan <peterlkeegan@gmail.com>
To: java-user@lucene.apache.org
Content-Type: multipart/alternative; boundary=90e6ba53acd8c63b2204a882b53b

--90e6ba53acd8c63b2204a882b53b
Content-Type: text/plain; charset=ISO-8859-1

It seems to me that to constrain the search to a sentence this way, you'd
have to override 'getPositionIncrementGap', which would then break phrase
searches across the field values (sentences).

Peter

On Wed, Jul 20, 2011 at 11:33 AM, <darren@ontrenet.com> wrote:

>
> I just parse the text into sentences and put those in a multi-valued field
> and then search that.
>
> On Wed, 20 Jul 2011 11:27:38 -0400, Peter Keegan <peterlkeegan@gmail.com>
> wrote:
> > I have browsed many suggestions on how to implement 'search within a
> > sentence', but all seem to have drawbacks. For example, from
> >
>
> http://lucene.472066.n3.nabble.com/Issue-with-sentence-specific-search-td1644352.html#a1645072
> >
> > Steve Rowe writes:
> >
> > ----------
> > One common technique, instead of using a larger-than-normal position
> > increment gap between sentences, is using a sentence boundary token like
> > '$'
> > or something else that won't ever itself be the target of search.
> Quoting
> > from a post Mark Miller made to the lucene-user list last year <
> >
>
> http://www.lucidimagination.com/search/document/c9641cbb1a3bf928/multiline_regex_with_lucene
> >>):
> >
> >         First you inject special marker tokens as your paragraph/
> >         sentence markers, then you use a SpanNotQuery that looks
> >         for a SpanNearQuery that doesn't intersect with a
> >         SpanTermQuery containing the special marker term.
> >
> > Mark's suggestion would work for your within-sentence case, and for the
> > case
> > where you don't care about sentence boundaries, you can use
> SpanNearQuery
> > without the SpanNotQuery.
> > ----------
> >
> > The problem with the last part is that the SpanNearQuery would have to
> have
> > a slop of 1 in order to accomodate the marker token between sentences.
> This
> > could result in incorrect matches if the a slop of 0 is intended.
> Another
> > suggestion was to overlap the marker token with the first or last token
> of
> > the sentence, but the SpanNotQuery would always exclude any terms in the
> > query that are at the intersection.  Mark Miller's 'SpanWithinQuery'
> patch
> > seems to have the same issue.
> >
> > Has anyone implemented a solution that works for both in-sentence and
> > across
> > sentence boundaries?
> >
> > Thanks,
> > Peter
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

--90e6ba53acd8c63b2204a882b53b--