Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 90648 invoked from network); 11 Mar 2011 10:04:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 11 Mar 2011 10:04:56 -0000 Received: (qmail 68854 invoked by uid 500); 11 Mar 2011 10:04:54 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 68816 invoked by uid 500); 11 Mar 2011 10:04:54 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 68807 invoked by uid 99); 11 Mar 2011 10:04:54 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Mar 2011 10:04:54 +0000 X-ASF-Spam-Status: No, hits=0.6 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ian.lea@gmail.com designates 209.85.210.176 as permitted sender) Received: from [209.85.210.176] (HELO mail-iy0-f176.google.com) (209.85.210.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Mar 2011 10:04:49 +0000 Received: by iyj12 with SMTP id 12so3823290iyj.35 for ; Fri, 11 Mar 2011 02:04:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=wh1FRNFwxSrry3GsvdkszZfmaPg57IvWF8/zbLYKQzw=; b=Yx4I7+I8iJcxsixOPV6kYT1J0V7gz+UesdEFKotC4B/YTJX38E5J3jRtVITsMZHiaf 7qMZA+gPiFof+6YWjM9mEMypwWxEFJ24BWcmVLH7ZE2jJ2zspIxelyJL+5WM9G+uGv2+ OdThlPsq9i4DLggqJmRRuarWjjtoyKkEaOhgk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=OZ0rH/kunnyoaRe2cbDA2poWtrdChTvAyKTWHdJ2xJo9EWPsy0dZtuRxbWt4fNoE8r cQmquR8cahpg84VSZcVV/x7QUvxY+vLvpsFXG4uz/U1n0rDeNQ0Xw7s9klC/wvYUCHvV hMzfCn9E3Me0uTVHUg0blCM8BbQSEy8M3WADY= Received: by 10.231.194.94 with SMTP id dx30mr6920645ibb.165.1299837869172; Fri, 11 Mar 2011 02:04:29 -0800 (PST) MIME-Version: 1.0 Received: by 10.231.36.198 with HTTP; Fri, 11 Mar 2011 02:04:09 -0800 (PST) In-Reply-To: <4D78FD87.9060100@lsv.uni-saarland.de> References: <4D708F60.5090507@lsv.uni-saarland.de> <4D70F9E5.1060809@lsv.uni-saarland.de> <4D78FD87.9060100@lsv.uni-saarland.de> From: Ian Lea Date: Fri, 11 Mar 2011 10:04:09 +0000 Message-ID: Subject: Re: index enforcing query terms to appear within the same sentence To: java-user@lucene.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable The example code in http://lucene.472066.n3.nabble.com/Problem-searching-in-the-same-sentence-t= d1501269.html reads custom standard analyzer: public class MyStandardAnalyzer extends StandardAnalyzer implements IndexFields { public MyStandardAnalyzer(Version matchVersion) { super(matchVersion); } public int getPositionIncrementGap(String fieldName) { int incrementGap =3D super.getPositionIncrementGap(fieldNam= e); if (fieldName.equals(IFIELD_TEXT)) { incrementGap +=3D 10; } return incrementGap; } } so if you used this analyzer and called new Field(IFIELD_TEXT, value, ...) and new Field("someothername", value, ...) the first field would get the modified gaps and the second one wouldn't. Hope that helps. -- Ian. On Thu, Mar 10, 2011 at 4:34 PM, Michael Wiegand wrote: > Conceptually, I think I know what to do. Unfortunately, with the given > interfaces of Lucene I have some difficulty. > > If I add the content of a document sentence by sentence, i.e. line by lin= e, > (using a multi-valued field), there are only two constructors possible: > Field(String name, String value, Field.Store store, Field.Index index) > or > Field(String name, String value, Field.Store store, Field.Index index, > Field.TermVector termVector) > The sentence comes as a string which I get from a BufferedReader-object b= y > using the readLine() method. > > But as far as I understood, I need to access some TokenStream-object in > order to set the PositionIncrementAttribute. So how should that work? > > Thank you in advance. > > Ian Lea schrieb: >>> >>> You can use multi valued fields if you play with the position >>> increment gap. =A0See e.g. >>> >>> http://lucene.472066.n3.nabble.com/Problem-searching-in-the-same-senten= ce-td1501269.html >>> >>> A google search for "lucene indexing sentences" or similar finds that, >>> and more. >>> >>> >>> Different docs can have different fields/different numbers of fields, >>> but the position gap approach is probably better. >>> >>> >>> -- >>> Ian. >>> >>> >>> On Fri, Mar 4, 2011 at 7:06 AM, Michael Wiegand >>> wrote: >>> >>>> >>>> Hi, >>>> >>>> I would like to create an index with Lucene to a document collections = of >>>> text files. >>>> The index should be created in such a way, that for the search I can >>>> enforce >>>> that query term A and query term B are contained within the same >>>> sentence. >>>> >>>> How should implement the index? Should I have for every sentence a >>>> different >>>> field (but make sure that it is not a multi-valued field because they >>>> would >>>> get merged which is exactly what I do not want)? >>>> Would it be problematic that different documents would then end up >>>> having >>>> different numbes of fields? >>>> >>>> Thank you in advance! >>>> >>>> Best, >>>> Michael >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>>> For additional commands, e-mail: java-user-help@lucene.apache.org >>>> >>>> >>>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >>> For additional commands, e-mail: java-user-help@lucene.apache.org >>> >>> >>> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org >> For additional commands, e-mail: java-user-help@lucene.apache.org >> >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org