Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 23174 invoked from network); 29 Dec 2005 12:02:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 29 Dec 2005 12:02:26 -0000 Received: (qmail 64988 invoked by uid 500); 29 Dec 2005 12:02:21 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 64674 invoked by uid 500); 29 Dec 2005 12:02:19 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 64663 invoked by uid 99); 29 Dec 2005 12:02:19 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Dec 2005 04:02:19 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [195.137.212.28] (HELO basicbox6.server-home.net) (195.137.212.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 29 Dec 2005 04:02:18 -0800 Received: from sleipnir2 (dslb-084-061-027-181.pools.arcor-ip.net [84.61.27.181]) by basicbox6.server-home.net (Postfix) with ESMTP id DC4C07183D8 for ; Thu, 29 Dec 2005 13:01:54 +0100 (CET) From: "Klaus" To: Subject: AW: Lucene parsing for PDF Date: Thu, 29 Dec 2005 13:01:54 +0100 Message-ID: <000b01c60c6f$a8fa0aa0$6402a8c0@sleipnir2> MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Mailer: Microsoft Office Outlook 11 Thread-Index: AcYMYRddW0Z/weZmQ4Sje27Li0TRUQADjTYQ X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2527 In-Reply-To: <3beac0c40512282240h48f80d3bl89916c0a89e4272d@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi, I think the easiest way is ro exclude the pages while you are parsing the pdf document. So you will provide just the necessary pages to lucene. Another solution is to create for each site an own document, this should hafe a field "pagenumber" or, und you can delete the document from the index. Peace --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org