Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D2E3E7547 for ; Wed, 20 Jul 2011 16:28:04 +0000 (UTC) Received: (qmail 27308 invoked by uid 500); 20 Jul 2011 16:28:02 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 27096 invoked by uid 500); 20 Jul 2011 16:28:02 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 27078 invoked by uid 99); 20 Jul 2011 16:28:00 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jul 2011 16:28:00 +0000 X-ASF-Spam-Status: No, hits=2.8 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of peterlkeegan@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-iw0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 20 Jul 2011 16:27:56 +0000 Received: by iwi5 with SMTP id 5so466982iwi.35 for ; Wed, 20 Jul 2011 09:27:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:x-google-sender-delegation:in-reply-to :references:date:x-google-sender-auth:message-id:subject:from:to :content-type; bh=kOFOhzNrPZMA4eAyV8jrxRXw6QP/Hu5SOEU81NxFQxw=; b=PrvkTI3DZfidY19S+I5WQr+INmkfFEV14uG0Aq+UqGUJ15V8JKgmok2alOrZMl4/UL y175IVE5FD89//41BHnHOr3VmVHz9f0JOvK9I+JKVG7aUv98xtCl7DbVmaYUqCOVga4M qzH6GbRgD/HpD4YClsi/tInowABolhlCdqNWo= MIME-Version: 1.0 Received: by 10.231.205.197 with SMTP id fr5mr2997219ibb.198.1311179255724; Wed, 20 Jul 2011 09:27:35 -0700 (PDT) Sender: pkeegan01451@gmail.com X-Google-Sender-Delegation: pkeegan01451@gmail.com Received: by 10.231.190.194 with HTTP; Wed, 20 Jul 2011 09:27:35 -0700 (PDT) In-Reply-To: <7da891a71ce0918841a29b330a68a832@localhost> References: <7da891a71ce0918841a29b330a68a832@localhost> Date: Wed, 20 Jul 2011 12:27:35 -0400 X-Google-Sender-Auth: NMkwDhAUU9y3Y0Ygu7i2ZMZTKEI Message-ID: Subject: Re: Search within a sentence (revisited) From: Peter Keegan To: java-user@lucene.apache.org Content-Type: multipart/alternative; boundary=90e6ba53acd8c63b2204a882b53b --90e6ba53acd8c63b2204a882b53b Content-Type: text/plain; charset=ISO-8859-1 It seems to me that to constrain the search to a sentence this way, you'd have to override 'getPositionIncrementGap', which would then break phrase searches across the field values (sentences). Peter On Wed, Jul 20, 2011 at 11:33 AM, wrote: > > I just parse the text into sentences and put those in a multi-valued field > and then search that. > > On Wed, 20 Jul 2011 11:27:38 -0400, Peter Keegan > wrote: > > I have browsed many suggestions on how to implement 'search within a > > sentence', but all seem to have drawbacks. For example, from > > > > http://lucene.472066.n3.nabble.com/Issue-with-sentence-specific-search-td1644352.html#a1645072 > > > > Steve Rowe writes: > > > > ---------- > > One common technique, instead of using a larger-than-normal position > > increment gap between sentences, is using a sentence boundary token like > > '$' > > or something else that won't ever itself be the target of search. > Quoting > > from a post Mark Miller made to the lucene-user list last year < > > > > http://www.lucidimagination.com/search/document/c9641cbb1a3bf928/multiline_regex_with_lucene > >>): > > > > First you inject special marker tokens as your paragraph/ > > sentence markers, then you use a SpanNotQuery that looks > > for a SpanNearQuery that doesn't intersect with a > > SpanTermQuery containing the special marker term. > > > > Mark's suggestion would work for your within-sentence case, and for the > > case > > where you don't care about sentence boundaries, you can use > SpanNearQuery > > without the SpanNotQuery. > > ---------- > > > > The problem with the last part is that the SpanNearQuery would have to > have > > a slop of 1 in order to accomodate the marker token between sentences. > This > > could result in incorrect matches if the a slop of 0 is intended. > Another > > suggestion was to overlap the marker token with the first or last token > of > > the sentence, but the SpanNotQuery would always exclude any terms in the > > query that are at the intersection. Mark Miller's 'SpanWithinQuery' > patch > > seems to have the same issue. > > > > Has anyone implemented a solution that works for both in-sentence and > > across > > sentence boundaries? > > > > Thanks, > > Peter > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --90e6ba53acd8c63b2204a882b53b--