commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Allison (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (COMPRESS-382) OutOfMemoryError from CompressorStreamFactory
Date Fri, 14 Apr 2017 19:21:42 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969413#comment-15969413
] 

Tim Allison edited comment on COMPRESS-382 at 4/14/17 7:21 PM:
---------------------------------------------------------------

This works, but I don't like breaking the simplicity of CompressorStreamFactory.  So, now
we're going to have a bunch of different settable parameters that require knowledge of the
individual child streams?  I don't like it.

The other thing I don't like is wrapping the memory limit exception in a CompressorException.
 Perhaps, create a new subclass of CompressorException, say MemoryThresholdHitException, and
apply across the streams for this kind of exception?

bq.  The current thinking behind this is that if you know you want certain parameters, then
you know which format you want and don't need to go through the factory anyway.

In Tika land, this isn't quite right, we know we want to set limits, and we're willing to
figure out that lzma is in kb and Z is in bytes for the data table, etc., but we don't (currently)
know the stream type.  We could (soon) detect() and then call the relevant Compressor, but
it would be easier to tell the factory what we want for each file type up front.

As for service loading...y, that makes parameterization challenging.  

In short, this PR is a proposal...a way to advance the conversation, not something that I
feel strongly about. :)

Recommendations?


was (Author: tallison@mitre.org):
This works, but I don't like breaking the simplicity of CompressorStreamFactory.  So, now
we're going to have a bunch of different settable parameters that require knowledge of the
individual child streams?  I don't like it.

bq.  The current thinking behind this is that if you know you want certain parameters, then
you know which format you want and don't need to go through the factory anyway.

In Tika land, this isn't quite right, we know we want to set limits, and we're willing to
figure out that lzma is in kb and Z is in bytes for the data table, etc., but we don't (currently)
know the stream type.  We could (soon) detect() and then call the relevant Compressor, but
it would be easier to tell the factory what we want for each file type up front.

As for service loading...y, that makes parameterization challenging.  

In short, this PR is a proposal...a way to advance the conversation, not something that I
feel strongly about. :)

Recommendations?

> OutOfMemoryError from CompressorStreamFactory
> ---------------------------------------------
>
>                 Key: COMPRESS-382
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-382
>             Project: Commons Compress
>          Issue Type: Bug
>          Components: Compressors
>    Affects Versions: 1.10, 1.11, 1.12
>         Environment: Windows7, jre1.8.0_101 x64
>            Reporter: Luis Filipe Nassif
>         Attachments: data.mui
>
>
> While using Tika-1.14 to detect file types, the attached 1KB file triggered an OOME with
1GB heap. Tika calls CompressorStreamFactory.createCompressorInputStream(in) to detect if
the file is a compressor stream, but CompressorStreamFactory erroneously detects it as a LZMACompressorInputStream
and when the LZMACompressorInputStream is instanciated the OOME is thrown. This error does
not happen with commons-compress versions prior to 1.10, when auto detecting LZMA streams
was added. OOME stacktrace below:
> {code}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> 	at org.tukaani.xz.lz.LZDecoder.<init>(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.tukaani.xz.LZMAInputStream.initialize(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.tukaani.xz.LZMAInputStream.initialize(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.tukaani.xz.LZMAInputStream.<init>(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.tukaani.xz.LZMAInputStream.<init>(Unknown Source) ~[xz-1.5.jar:1.5]
> 	at org.apache.commons.compress.compressors.lzma.LZMACompressorInputStream.<init>(LZMACompressorInputStream.java:48)
~[commons-compress-1.10.jar:1.10]
> 	at org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:251)
~[commons-compress-1.10.jar:1.10]
> 	at org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:109)
~[tika-parsers-1.14.jar:1.14]
> 	at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:95)
~[tika-parsers-1.14.jar:1.14]
> 	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) ~[tika-core-1.14.jar:1.14]
> 	at dpf.sp.gpinf.indexer.process.task.SignatureTask.process(SignatureTask.java:50) ~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processMonitorTimeout(AbstractTask.java:203)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:152)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.sendToNextTask(AbstractTask.java:190)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:160)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.sendToNextTask(AbstractTask.java:190)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:160)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.sendToNextTask(AbstractTask.java:190)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:160)
~[iped.jar:?]
> 	at dpf.sp.gpinf.indexer.process.Worker.process(Worker.java:174) ~[iped.jar:?]
> 	... 1 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message