Return-Path: X-Original-To: apmail-commons-issues-archive@minotaur.apache.org Delivered-To: apmail-commons-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4DCDB11AA0 for ; Mon, 11 Aug 2014 05:48:13 +0000 (UTC) Received: (qmail 68457 invoked by uid 500); 11 Aug 2014 05:48:12 -0000 Delivered-To: apmail-commons-issues-archive@commons.apache.org Received: (qmail 68365 invoked by uid 500); 11 Aug 2014 05:48:12 -0000 Mailing-List: contact issues-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: issues@commons.apache.org Delivered-To: mailing list issues@commons.apache.org Received: (qmail 68353 invoked by uid 99); 11 Aug 2014 05:48:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Aug 2014 05:48:12 +0000 Date: Mon, 11 Aug 2014 05:48:12 +0000 (UTC) From: "Stefan Bodewig (JIRA)" To: issues@commons.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (COMPRESS-285) checking of availability of XZ compression is expensive - result should be reused MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/COMPRESS-285?page=3Dcom.atlassi= an.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D14= 092467#comment-14092467 ]=20 Stefan Bodewig commented on COMPRESS-285: ----------------------------------------- Thanks Sebb, I think your two suggestions are good ideas and will see to im= plementing them the coming week, in particular you will only pay for the fa= iled XZ check if you are really trying to uncompress XZ streams. The additional constructor won't help Wojciech since he's using Compress be= hind Tika, Tika would need to get adapted to the new constuctor and in the = end implement its own logic which would also need to take OSGi contexts int= o account. I think it might be a good idea to add an explicit flag whether= the result is cacheable and make that flag default to true unless BundleEv= ent can be loaded - Wojciech would then need to set the flag explicitly. > checking of availability of XZ compression is expensive - result should b= e reused > -------------------------------------------------------------------------= -------- > > Key: COMPRESS-285 > URL: https://issues.apache.org/jira/browse/COMPRESS-285 > Project: Commons Compress > Issue Type: Improvement > Components: Compressors > Affects Versions: 1.5, 1.6, 1.7, 1.8 > Environment: linux 64-bit, java 7, glassfish, solr, tika > Reporter: Wojciech =C5=81ozowicki > Priority: Minor > Labels: performance > > I use solr with apache tika for indexing documents. Tika uses commons-com= press to handle compressed files. Using sampler (jvisualvm) I have seen tha= t quite a lot of time (5-7%) during my tests is spent in XZUtils.isXZCompre= ssionAvailable because of unavailable XZ compression (I guess for each time= classloaders spend some time looking for unavailable classes, then NoClass= DefFoundError). > I think the result of the first check should be stored and reused. > Here is the stacktrace (just to show the way tika is using commons-compre= ss): > org.apache.commons.compress.compressors.xz.XZUtils.isXZCompressionAvailab= le(XZUtils.java:52) > =09at org.apache.commons.compress.compressors.CompressorStreamFactory.cre= ateCompressorInputStream(CompressorStreamFactory.java:140) > =09at org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFor= mat(ZipContainerDetector.java:95) > =09at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainer= Detector.java:81) > =09at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.j= ava:61) -- This message was sent by Atlassian JIRA (v6.2#6252)