Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 27280 invoked from network); 23 May 2007 19:00:13 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 23 May 2007 19:00:13 -0000 Received: (qmail 69228 invoked by uid 500); 23 May 2007 19:00:05 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 69194 invoked by uid 500); 23 May 2007 19:00:05 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 69175 invoked by uid 99); 23 May 2007 19:00:05 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 May 2007 12:00:05 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of erickerickson@gmail.com designates 66.249.92.175 as permitted sender) Received: from [66.249.92.175] (HELO ug-out-1314.google.com) (66.249.92.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 23 May 2007 11:59:58 -0700 Received: by ug-out-1314.google.com with SMTP id m2so250394uge for ; Wed, 23 May 2007 11:59:36 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=NlSP74h+kzBBuN2pFnO2eAlXa8CueDyVwlrYsDWNcYIt9gaZGdp8dOvCtBXx7xcZ8gFSo5gb4cu2bTIqWUc1dvzkWQZ3PV6D8aYF2FY9Flup+VKYD49fqV1shALGSdzoXZZjOfQ3yOr9gQhfvs5O3KXu4nLXkaqRcbP82VP+aa0= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=hd2uZyJqS8ArMzGFRxUDv0SSw8oW8OEp7fxBp7rwRadL3uKelmNUnUVxqb3jiOH7+MEd9klGnP6Os+Kl8l/zB+2OkduHfsFzDIXfQNI62AUT9ux85C72gAEm3dQ5YpVeEILzTsNMeylwPqWVaeiFPVmeBxQhoxN/RPIUe8CysbI= Received: by 10.82.191.3 with SMTP id o3mr1711246buf.1179946775826; Wed, 23 May 2007 11:59:35 -0700 (PDT) Received: by 10.82.190.7 with HTTP; Wed, 23 May 2007 11:59:35 -0700 (PDT) Message-ID: <359a92830705231159j5cba1258ncfa116a9ce15492a@mail.gmail.com> Date: Wed, 23 May 2007 14:59:35 -0400 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: How to filter fields with hits from result set In-Reply-To: <3FB08D6A21B3EC4D8749EF6D9E626278013B199D@WDCCPMAIL01.markettools.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_77392_16335931.1179946775781" References: <3FB08D6A21B3EC4D8749EF6D9E626278013B199D@WDCCPMAIL01.markettools.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_77392_16335931.1179946775781 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline As luck would have it, I've done something very similar. What I had to do is index a special token at the end of each page. Then I could get the term offsets for each page.... Then I used one of the SpanQuery.getSpans to get all of the offsets of the hits throughout all of the pages. now I have a list of all the offsets of the *last* term on each page and a list of the offsets of the hits. From these two lists I can know which pages have hits. Best Erick On 5/23/07, Andreas Guther wrote: > > Hi, > > If a search returns a document that has multiple fields with the same > name, is there a way to filter only those fields that contain hits? > > > Background: > > I am indexing documents and we store all content in our index for > display reasons. We want to show only those pages containing hits. My > first implementation was saving each page in a Lucene document. For > performance reasons why are now looking into indexing the complete > indexed document as a single Lucene document. > > Every page is added to a field in the Lucene document named > page-content. That means I am ending with as many fields named > page-content as the document has pages. > > My search now returns me a single Lucene document in contrary to my > first approach with page per Lucene document. My problem right now is: > how can I limit the returned page-contents fields for pages to those > field entries that contain hits. If I have hits on pages five pages > from a document with 10 pages I would like to have only the pages with > the hits, not all. > > Is there anything in Lucene that limits the returned fields to fields > with hits only? > > Thanks in advance, > > Andreas > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_77392_16335931.1179946775781--