commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefan Bodewig (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (COMPRESS-285) checking of availability of XZ compression is expensive - result should be reused
Date Mon, 11 Aug 2014 05:48:12 GMT

    [ https://issues.apache.org/jira/browse/COMPRESS-285?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14092467#comment-14092467
] 

Stefan Bodewig commented on COMPRESS-285:
-----------------------------------------

Thanks Sebb, I think your two suggestions are good ideas and will see to implementing them
the coming week, in particular you will only pay for the failed XZ check if you are really
trying to uncompress XZ streams.

The additional constructor won't help Wojciech since he's using Compress behind Tika, Tika
would need to get adapted to the new constuctor and in the end implement its own logic which
would also need to take OSGi contexts into account.  I think it might be a good idea to add
an explicit flag whether the result is cacheable and make that flag default to true unless
BundleEvent can be loaded - Wojciech would then need to set the flag explicitly.

> checking of availability of XZ compression is expensive - result should be reused
> ---------------------------------------------------------------------------------
>
>                 Key: COMPRESS-285
>                 URL: https://issues.apache.org/jira/browse/COMPRESS-285
>             Project: Commons Compress
>          Issue Type: Improvement
>          Components: Compressors
>    Affects Versions: 1.5, 1.6, 1.7, 1.8
>         Environment: linux 64-bit, java 7, glassfish, solr, tika
>            Reporter: Wojciech Ɓozowicki
>            Priority: Minor
>              Labels: performance
>
> I use solr with apache tika for indexing documents. Tika uses commons-compress to handle
compressed files. Using sampler (jvisualvm) I have seen that quite a lot of time (5-7%) during
my tests is spent in XZUtils.isXZCompressionAvailable because of unavailable XZ compression
(I guess for each time classloaders spend some time looking for unavailable classes, then
NoClassDefFoundError).
> I think the result of the first check should be stored and reused.
> Here is the stacktrace (just to show the way tika is using commons-compress):
> org.apache.commons.compress.compressors.xz.XZUtils.isXZCompressionAvailable(XZUtils.java:52)
> 	at org.apache.commons.compress.compressors.CompressorStreamFactory.createCompressorInputStream(CompressorStreamFactory.java:140)
> 	at org.apache.tika.parser.pkg.ZipContainerDetector.detectCompressorFormat(ZipContainerDetector.java:95)
> 	at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:81)
> 	at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:61)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message