jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jukka Zitting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (JCR-2395) Text Extractor: Image parser throws exception (jpeg)
Date Tue, 17 Nov 2009 13:20:39 GMT

    [ https://issues.apache.org/jira/browse/JCR-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778859#action_12778859
] 

Jukka Zitting commented on JCR-2395:
------------------------------------

Do you have an example image that triggers this behaviour? For some reason (.jpg extension?)
the image is parsed as a JPEG, which causes the exception shown above.

Since Tika currently only supports metadata extraction from images and we only care about
the extracted text content, we can avoid this issue simply by disabling the ImageParser in
the default configuration.

> Text Extractor: Image parser throws exception (jpeg)
> ----------------------------------------------------
>
>                 Key: JCR-2395
>                 URL: https://issues.apache.org/jira/browse/JCR-2395
>             Project: Jackrabbit Content Repository
>          Issue Type: Bug
>          Components: jackrabbit-text-extractors
>    Affects Versions: 2.0-beta1
>            Reporter: Philipp Koch
>
> the below exception is thrown over an over while uploading jpeg images:
> 16.11.2009 17:20:42 *WARN * LazyTextExtractorField: Failed to extract text from a binary
property (LazyTextExtractorField.java, line 165)
> org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.image.ImageParser@c7bc3
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:125)
> 	at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
> 	at org.apache.jackrabbit.core.query.lucene.LazyTextExtractorField$ParsingTask.run(LazyTextExtractorField.java:160)
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:417)
> 	at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:269)
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:123)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:65)
> 	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:168)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:650)
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:675)
> 	at java.lang.Thread.run(Thread.java:613)
> Caused by: javax.imageio.IIOException: Not a JPEG file: starts with 0x00 0x05
> 	at com.sun.imageio.plugins.jpeg.JPEGImageReader.readImageHeader(Native Method)
> 	at com.sun.imageio.plugins.jpeg.JPEGImageReader.readNativeHeader(JPEGImageReader.java:554)
> 	at com.sun.imageio.plugins.jpeg.JPEGImageReader.checkTablesOnly(JPEGImageReader.java:309)
> 	at com.sun.imageio.plugins.jpeg.JPEGImageReader.gotoImage(JPEGImageReader.java:431)
> 	at com.sun.imageio.plugins.jpeg.JPEGImageReader.readHeader(JPEGImageReader.java:547)
> 	at com.sun.imageio.plugins.jpeg.JPEGImageReader.getHeight(JPEGImageReader.java:609)
> 	at org.apache.tika.parser.image.ImageParser.parse(ImageParser.java:47)
> 	at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:119)
> 	... 10 more

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message