Return-Path: Delivered-To: apmail-jackrabbit-dev-archive@www.apache.org Received: (qmail 32916 invoked from network); 17 Nov 2009 13:21:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Nov 2009 13:21:04 -0000 Received: (qmail 96816 invoked by uid 500); 17 Nov 2009 13:21:04 -0000 Delivered-To: apmail-jackrabbit-dev-archive@jackrabbit.apache.org Received: (qmail 96768 invoked by uid 500); 17 Nov 2009 13:21:03 -0000 Mailing-List: contact dev-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@jackrabbit.apache.org Delivered-To: mailing list dev@jackrabbit.apache.org Received: (qmail 96760 invoked by uid 99); 17 Nov 2009 13:21:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Nov 2009 13:21:03 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Nov 2009 13:21:01 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id AB13F234C1F1 for ; Tue, 17 Nov 2009 05:20:39 -0800 (PST) Message-ID: <1452802740.1258464039699.JavaMail.jira@brutus> Date: Tue, 17 Nov 2009 13:20:39 +0000 (UTC) From: "Jukka Zitting (JIRA)" To: dev@jackrabbit.apache.org Subject: [jira] Commented: (JCR-2395) Text Extractor: Image parser throws exception (jpeg) In-Reply-To: <1568168280.1258444479845.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/JCR-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778859#action_12778859 ] Jukka Zitting commented on JCR-2395: ------------------------------------ Do you have an example image that triggers this behaviour? For some reason (.jpg extension?) the image is parsed as a JPEG, which causes the exception shown above. Since Tika currently only supports metadata extraction from images and we only care about the extracted text content, we can avoid this issue simply by disabling the ImageParser in the default configuration. > Text Extractor: Image parser throws exception (jpeg) > ---------------------------------------------------- > > Key: JCR-2395 > URL: https://issues.apache.org/jira/browse/JCR-2395 > Project: Jackrabbit Content Repository > Issue Type: Bug > Components: jackrabbit-text-extractors > Affects Versions: 2.0-beta1 > Reporter: Philipp Koch > > the below exception is thrown over an over while uploading jpeg images: > 16.11.2009 17:20:42 *WARN * LazyTextExtractorField: Failed to extract text from a binary property (LazyTextExtractorField.java, line 165) > org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.image.ImageParser@c7bc3 > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:125) > at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) > at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:160) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269) > at java.util.concurrent.FutureTask.run(FutureTask.java:123) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:65) > at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:168) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675) > at java.lang.Thread.run(Thread.java:613) > Caused by: javax.imageio.IIOException: Not a JPEG file: starts with 0x00 0x05 > at com.sun.imageio.plugins.jpeg.JPEGImageReader.readImageHeader(Native Method) > at com.sun.imageio.plugins.jpeg.JPEGImageReader.readNativeHeader(JPEGImageReader.java:554) > at com.sun.imageio.plugins.jpeg.JPEGImageReader.checkTablesOnly(JPEGImageReader.java:309) > at com.sun.imageio.plugins.jpeg.JPEGImageReader.gotoImage(JPEGImageReader.java:431) > at com.sun.imageio.plugins.jpeg.JPEGImageReader.readHeader(JPEGImageReader.java:547) > at com.sun.imageio.plugins.jpeg.JPEGImageReader.getHeight(JPEGImageReader.java:609) > at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:47) > at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119) > ... 10 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.