Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9AA11200C56 for ; Fri, 14 Apr 2017 18:03:48 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 993E0160B8C; Fri, 14 Apr 2017 16:03:48 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E24A5160B80 for ; Fri, 14 Apr 2017 18:03:47 +0200 (CEST) Received: (qmail 70213 invoked by uid 500); 14 Apr 2017 16:03:46 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 70040 invoked by uid 99); 14 Apr 2017 16:03:46 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Apr 2017 16:03:46 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 2A5DACCF4E for ; Fri, 14 Apr 2017 16:03:46 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id XtxkHAkBB0n8 for ; Fri, 14 Apr 2017 16:03:45 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id D18135FC4C for ; Fri, 14 Apr 2017 16:03:44 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id E99A6E0BCD for ; Fri, 14 Apr 2017 16:03:43 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 557CC21B53 for ; Fri, 14 Apr 2017 16:03:42 +0000 (UTC) Date: Fri, 14 Apr 2017 16:03:42 +0000 (UTC) From: "Stefan Bodewig (JIRA)" To: issues@commons.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (COMPRESS-382) OutOfMemoryError from CompressorStreamFactory MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 14 Apr 2017 16:03:48 -0000 [ https://issues.apache.org/jira/browse/COMPRESS-382?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15969201#comment-15969201 ] Stefan Bodewig commented on COMPRESS-382: ----------------------------------------- Some of the streams support parametrizations, like the window size for Snappy, the compression level of gzip and so on. So far we pick something as a default and this is what you get when using the factory. The current thinking behind this is that if you know you want certain parameters, then you know which format you want and don't need to go through the factory anyway. I'm not sure whether we can actually enforce thresholds for all of the formats. If we can, then it may make sense to add it as a parameter to the factory. Unfortunately the interface approach of ServiceLoader makes it way more difficult to evolve the factory's contract. > OutOfMemoryError from CompressorStreamFactory > --------------------------------------------- > > Key: COMPRESS-382 > URL: https://issues.apache.org/jira/browse/COMPRESS-382 > Project: Commons Compress > Issue Type: Bug > Components: Compressors > Affects Versions: 1.10, 1.11, 1.12 > Environment: Windows7, jre1.8.0_101 x64 > Reporter: Luis Filipe Nassif > Attachments: data.mui > > > While using Tika-1.14 to detect file types, the attached 1KB file triggered an OOME with 1GB heap. Tika calls CompressorStreamFactory.createCompressorInputStream(in) to detect if the file is a compressor stream, but CompressorStreamFactory erroneously detects it as a LZMACompressorInputStream and when the LZMACompressorInputStream is instanciated the OOME is thrown. This error does not happen with commons-compress versions prior to 1.10, when auto detecting LZMA streams was added. OOME stacktrace below: > {code} > Caused by: java.lang.OutOfMemoryError: Java heap space > at org.tukaani.xz.lz.LZDecoder.(Unknown Source) ~[xz-1.5.jar:1.5] > at org.tukaani.xz.LZMAInputStream.initialize(Unknown Source) ~[xz-1.5.jar:1.5] > at org.tukaani.xz.LZMAInputStream.initialize(Unknown Source) ~[xz-1.5.jar:1.5] > at org.tukaani.xz.LZMAInputStream.(Unknown Source) ~[xz-1.5.jar:1.5] > at org.tukaani.xz.LZMAInputStream.(Unknown Source) ~[xz-1.5.jar:1.5] > at org.apache.commons.compress.compressors.lzma.LZMACompressorInputStream.(LZMACompressorInputStream.java:48) ~[commons-compress-1.10.jar:1.10] > at org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:251) ~[commons-compress-1.10.jar:1.10] > at org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:109) ~[tika-parsers-1.14.jar:1.14] > at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:95) ~[tika-parsers-1.14.jar:1.14] > at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77) ~[tika-core-1.14.jar:1.14] > at dpf.sp.gpinf.indexer.process.task.SignatureTask.process(SignatureTask.java:50) ~[iped.jar:?] > at dpf.sp.gpinf.indexer.process.task.AbstractTask.processMonitorTimeout(AbstractTask.java:203) ~[iped.jar:?] > at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:152) ~[iped.jar:?] > at dpf.sp.gpinf.indexer.process.task.AbstractTask.sendToNextTask(AbstractTask.java:190) ~[iped.jar:?] > at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:160) ~[iped.jar:?] > at dpf.sp.gpinf.indexer.process.task.AbstractTask.sendToNextTask(AbstractTask.java:190) ~[iped.jar:?] > at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:160) ~[iped.jar:?] > at dpf.sp.gpinf.indexer.process.task.AbstractTask.sendToNextTask(AbstractTask.java:190) ~[iped.jar:?] > at dpf.sp.gpinf.indexer.process.task.AbstractTask.processAndSendToNextTask(AbstractTask.java:160) ~[iped.jar:?] > at dpf.sp.gpinf.indexer.process.Worker.process(Worker.java:174) ~[iped.jar:?] > ... 1 more > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)