From java-user-return-15264-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Fri Jul 01 18:53:25 2005 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 17142 invoked from network); 1 Jul 2005 18:53:25 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 1 Jul 2005 18:53:25 -0000 Received: (qmail 84976 invoked by uid 500); 1 Jul 2005 18:53:18 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 84955 invoked by uid 500); 1 Jul 2005 18:53:18 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 84937 invoked by uid 99); 1 Jul 2005 18:53:17 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jul 2005 11:53:17 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of dmccallie@cerner.com designates 159.140.213.148 as permitted sender) Received: from [159.140.213.148] (HELO ns02.cerner.com) (159.140.213.148) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Jul 2005 11:53:19 -0700 Received: from cerner.com ([159.140.213.242]) by ns02.cerner.com (smtp) with ESMTP id j61IrDxm007737 for ; Fri, 1 Jul 2005 13:53:13 -0500 Received: from ([10.160.12.234]) by CT03.cerner.com with ESMTP id KP-TRRB5.1030439; Fri, 01 Jul 2005 13:52:54 -0500 Received: from msbhwhq02.northamerica.cerner.net ([10.160.16.184]) by msscanwhq02.northamerica.cerner.net with InterScan Messaging Security Suite; Fri, 01 Jul 2005 13:52:53 -0500 Received: from MSMBWHQ03.northamerica.cerner.net ([10.160.16.177]) by msbhwhq02.northamerica.cerner.net with Microsoft SMTPSVC(6.0.3790.1830); Fri, 1 Jul 2005 13:52:53 -0500 x-mimeole: Produced By Microsoft Exchange V6.5.7226.0 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Subject: RE: Sentence and Paragraph searching Date: Fri, 1 Jul 2005 13:52:45 -0500 Message-ID: X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Sentence and Paragraph searching Thread-Index: AcV+Rg9fTt/EIBIySCO3f8IWmtd1dAADVEpAAAZk1iA= From: "McCallie,David" To: X-OriginalArrivalTime: 01 Jul 2005 18:52:53.0719 (UTC) FILETIME=[1665A670:01C57E6E] X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Couldn't you use SpanQuery for something like this? Put special and tokens around each sentence, and then search for the specific key words inside of the outer SPAN? Do the same for paragraphs, sections, etc. I tried this once, and it seemed to work. I'm not sure of the performance penalty of the SPAN overhead. --david -----Original Message----- From: Peter Laurinc [mailto:laurinc@felisconsulting.com]=0D Sent: Friday, July 01, 2005 10:46 AM To: java-user@lucene.apache.org Subject: RE: Sentence and Paragraph searching Maybe the solution is have to each term not only position but also something like vector. Then you can "vectorize it": term 1 has vector 1, 1 term 2 has vector 1, 1 (1 paragraph, 1 sentence of this paragraph) , term 3 has (1, 2) if you set query for searching in paragraph/sentence you only set what portion of vector must be same.=0D Is this the way?=0D -----Original Message----- From: Erik Hatcher [mailto:erik@ehatchersolutions.com] Sent: Friday, July 01, 2005 4:04 PM To: java-user@lucene.apache.org Subject: Re: Sentence and Paragraph searching On Jul 1, 2005, at 8:16 AM, Peter Laurinc wrote: > Hi, > > I'm newbie to lucene. > I wan to ask, how to implement search for phrase that must be in=0D > sentence/paragraph. > I did see som examples, that uses term position changing, but I think=0D > that this is not the way, because it breaks classic proximity search. > (if one word is on end and second of begining of next sentence) It really depends on your needs. If you never need proximity across =0D sentence boundaries, then what's the issue? Putting a large gap at =0D sentence boundaries makes good sense for some needs. Maybe not so for your situation? I'm definitely interested in what others have done with this sort of thing. At the extreme, if all you wanted was to find sentences and did not need to query for terms in multiple sentences at one time then you could index each sentence as a separate Document. Erik --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org CONFIDENTIALITY NOTICE This message and any included attachments are from Cerner Corporation and are intended only for the addressee. The information contained in this message is confidential and may constitute inside or non-public information under international, federal, or state securities laws. Unauthorized forwarding, printing, copying, distribution, or use of such information is strictly prohibited and may be unlawful. If you are not the addressee, please promptly delete this message and notify the sender of the delivery error by e-mail or you may call Cerner's corporate offices in Kansas City, Missouri, U.S.A at (+1) (816)221-1024. ---------------------------------------- -- --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org