commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Bodewig (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COMPRESS-382) OutOfMemoryError from CompressorStreamFactory
Date Sat, 04 Feb 2017 14:15:52 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15852806#comment-15852806
] 

Stefan Bodewig commented on COMPRESS-382:
-----------------------------------------

LZMA is identified by a three byte header, which is very short and this may lead to quite
a few false positives. I'm not sure how to avoid this without removing LZMA autodetection
completely.

What you describe makes me think Tika could benefit from us adding some kind of "does this
look like a compressor stream"  or even "what kind of format do you think this is" kind of
method in {{CompressorStreamFactory}}.

> OutOfMemoryError from CompressorStreamFactory
> ---------------------------------------------
>
>                 Key: COMPRESS-382
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-382
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.10, 1.11, 1.12
>         Environment: Windows7, jre1.8.0_101 x64
>            Reporter: Luis Filipe Nassif
>         Attachments: data.mui
>
>
> While using Tika-1.14 to detect file types, the attached 1KB file triggered an OOME with
1GB heap. Tika calls CompressorStreamFactory.createCompressorInputStream(in) to detect if
the file is a compressor stream, but CompressorStreamFactory erroneously detects it as a LZMACompressorInputStream
and when the LZMACompressorInputStream is instanciated the OOME is thrown. This error does
not happen with commons-compress versions prior to 1.10, when auto detecting LZMA streams
was added. OOME stacktrace below:
> {code}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> 	at org.tukaani.xz.lz.LZDecoder.<init>(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.tukaani.xz.LZMAInputStream.initialize(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.tukaani.xz.LZMAInputStream.initialize(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.tukaani.xz.LZMAInputStream.<init>(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.tukaani.xz.LZMAInputStream.<init>(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.apache.commons.compress.compressors.lzma.LZMACompressorInputStream.<init>(LZMACompressorInputStream.java:48)
~[commons-compress-1.10.jar:1.10]
> 	at org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:251)
~[commons-compress-1.10.jar:1.10]
> 	at org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:109)
~[tika-parsers-1.14.jar:1.14]
> 	at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:95)
~[tika-parsers-1.14.jar:1.14]
> 	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) ~[tika-core-1.14.jar:1.14]
> 	at dpf.sp.gpinf.indexer.process.task.SignatureTask.process(SignatureTask.java:50) ~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processMonitorTimeout(AbstractTask.java:203)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:152)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.sendToNextTask(AbstractTask.java:190)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:160)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.sendToNextTask(AbstractTask.java:190)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:160)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.sendToNextTask(AbstractTask.java:190)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:160)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.Worker.process(Worker.java:174) ~[iped.jar:?]
> 	... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message