Return-Path: Delivered-To: apmail-jackrabbit-users-archive@minotaur.apache.org Received: (qmail 16418 invoked from network); 28 Apr 2010 09:24:11 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Apr 2010 09:24:11 -0000 Received: (qmail 64836 invoked by uid 500); 28 Apr 2010 09:24:11 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 64746 invoked by uid 500); 28 Apr 2010 09:24:09 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 64737 invoked by uid 99); 28 Apr 2010 09:24:09 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Apr 2010 09:24:09 +0000 X-ASF-Spam-Status: No, hits=2.9 required=10.0 tests=AWL,HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [74.125.82.170] (HELO mail-wy0-f170.google.com) (74.125.82.170) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 28 Apr 2010 09:24:03 +0000 Received: by wyf28 with SMTP id 28so3808346wyf.1 for ; Wed, 28 Apr 2010 02:23:42 -0700 (PDT) MIME-Version: 1.0 Received: by 10.216.177.82 with SMTP id c60mr4520119wem.25.1272446621503; Wed, 28 Apr 2010 02:23:41 -0700 (PDT) Received: by 10.216.29.21 with HTTP; Wed, 28 Apr 2010 02:23:41 -0700 (PDT) In-Reply-To: References: Date: Wed, 28 Apr 2010 11:23:41 +0200 Message-ID: Subject: Re: Problem con textExtractor From: JOSE FELIX HERNANDEZ BARRIO To: users@jackrabbit.apache.org Content-Type: multipart/alternative; boundary=001636833a44dee2ba0485489054 --001636833a44dee2ba0485489054 Content-Type: text/plain; charset=ISO-8859-1 is there any limitation on the size of the pdf the extractor can manage ? we're working with files around 16mb in size. 2010/4/28 JOSE FELIX HERNANDEZ BARRIO > I don't want to index the content of the pdf for full text search, > can i disable it using the configuration below? > > > > > > > > > > > > > > > > 2010/4/28 Jukka Zitting > > Hi, >> >> On Wed, Apr 28, 2010 at 10:50 AM, JOSE FELIX HERNANDEZ BARRIO >> wrote: >> > I'm inserting pdf in the repository and get the exception: >> > >> > 2010-04-28 10:25:39,763 WARN [PDFStreamEngine.java] [processOperator] * >> > java.io.IOException*: Mapping code should be 1 or two bytes and not 4 >> > at org.apache.fontbox.cmap.CMap.addMapping(*CMap.java:122*) >> >> The underlying PDFBox library is having trouble with your PDF file, >> which results in a warning being logged. This is not too serious, the >> only downside is that this PDF might not show up in full text >> searches. >> >> You may want to report this to users@pdfbox.apache.org or to the >> PDFBox issue tracker at https://issues.apache.org/jira/browse/PDFBOX. >> >> BR, >> >> Jukka Zitting >> > > > > -- > Jose Hernandez > 675599600 > Isthari > http://www.isthari.com > -- Jose Hernandez 675599600 Isthari http://www.isthari.com --001636833a44dee2ba0485489054--