Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 52447 invoked from network); 30 Mar 2005 12:25:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 30 Mar 2005 12:25:14 -0000 Received: (qmail 86250 invoked by uid 500); 30 Mar 2005 12:25:09 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 86190 invoked by uid 500); 30 Mar 2005 12:25:09 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 86161 invoked by uid 99); 30 Mar 2005 12:25:09 -0000 X-ASF-Spam-Status: No, hits=0.9 required=10.0 tests=DNS_FROM_RFC_ABUSE,FROM_ENDS_IN_NUMS,RCVD_BY_IP,SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of dmsmith555@gmail.com designates 64.233.184.197 as permitted sender) Received: from wproxy.gmail.com (HELO wproxy.gmail.com) (64.233.184.197) by apache.org (qpsmtpd/0.28) with ESMTP; Wed, 30 Mar 2005 04:25:07 -0800 Received: by wproxy.gmail.com with SMTP id 50so151395wri for ; Wed, 30 Mar 2005 04:25:05 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:disposition-notification-to:date:user-agent:x-accept-language:mime-version:to:subject:content-type:content-transfer-encoding:from; b=rvX4Cbld/NPkdI28gSzC39WdBCfu22JOeqdnNAt/Uh0Aay+c47EtAYINIwyIGQWTnwPD+ElZGvs3AQDcBcq9RSdeITtNjJg2IntRfIr1FqgZcDJ3ugd82Nxh+QvAhr+0rPfmbey2Oi075gBq/omWAdeEBbvK3NaPJnDfw3JlJLI= Received: by 10.54.24.48 with SMTP id 48mr397140wrx; Wed, 30 Mar 2005 04:25:05 -0800 (PST) Received: from ?192.168.0.15? ([68.205.248.179]) by mx.gmail.com with ESMTP id g7sm472381wra.2005.03.30.04.25.04; Wed, 30 Mar 2005 04:25:04 -0800 (PST) Message-ID: <424A9A9D.1040500@madisonresearch.com> Date: Wed, 30 Mar 2005 07:25:01 -0500 User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Document proximity Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit From: DM Smith X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi, I hope I am posting to the right list. We (sword and jsword at crosswire.org) are indexing bibles with each verse becoming a document, with the verse text being indexed and the verse reference being stored. This way we can search the text and get which verses have hits. The problem is that verse is an artifical document boundary. Frequently, verses cut a paragraph into parts, a poem into stanzas, ... and the significant parts are across verses. (But we usually don't have these in our markup) Is there any thought of adding a NEAR operator that will work across documents? Specifically, find x NEAR y, where the distance given to near is not understood as words but documents. (We do have a solution that stands entirely outside of lucene, but it would be better (for us :) if Lucene had the capability.) It would also be good to have the ability to have search automatically consider that adjacent documents are flowing unless some token in the doucment interrupts the flow. In this case, search would return a compound document as a hit. Thanks, DM Smith --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org