commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Bodewig <bode...@apache.org>
Subject [compress] Pack200
Date Sun, 04 Sep 2011 06:04:15 GMT
Hi,

I've just committed Converter*Stream implementations for Pack200[1]
which is a bit unusual in several ways.

First of all it will (by design of the format) only work on compressing
valid jar files.  Actually the result isn't likely to be compressed (in
the sense of "smaller than the original") at all but expects another
step of GZip compression in most cases.

The second difference to the other compressors is that the API provided
by the Java classlib doesn't lend itself to streaming at all.  There is
a Packer/Unpacker that expects an InputStream and an OutputStream and
converts from one to the other in a single blocking operation (even
closing the input side when done).

I have experimented with Piped*Streams as well as Ant/commons-exec-like
stream pumping in order to provide a streaming experience but always ran
into some edge cases where things broke down.  I'll give one example
below.

The current implementation of Pack200CompressorInputStream will
pass the wrapped input and an OutputStream writing to a cache to the
Unpacker synchronously inside the constructor, consuming the input
completely.  It will then defer all read-operations to the cache.

Likewise the Pack200CompressorOutputStream will buffer up all write
operations in a cache and once finish() or close() is called the cache
is converted to an InputStream that is then passed together with the
originally wrapped output to the Packer and written synchronously.

Caches can be in-memory (using ByteArray*Stream) or temporary files
controlled by a constructor option with in-memory as the default and
temp-files for cases where the archives are expected to be big.

Because of this design the byte-count methods don't make any sense (do
we count when data is written-to/read-from the cache or while the
(Un)Packer is doing its work?) and haven't been implemented at all.

The class names StreamMode and StreamSwitcher result from my attempts of
using real streams and should be changed unless anybody else comes up
with a working streaming solution.

The biggest hurdle for any streaming solution is that there is always
going to be some sort of intermediate buffer.  Something picks up data
written to the output stream and makes it available to the input stream
side.  Once the buffer is full, nothing can be written unless anybody
reads input in a timely manner.

In the case of a Pack200CompressorInputStream you don't have any control
over when the user code is going to read the data and whether it is
going to consume all of it at all.  For example if the stream is wrapped
in a ZipArchiveInputStream (it represents a JAR, after all), it is never
going to be consumed completely because the archive contains ZIP data at
the end that is ignored by the input stream implementation.

There are more cases where the Pack/Unpack operation would end up
blocked so I decided to only code the more robust indirect solution for
now.

Stefan

[1] http://download.oracle.com/javase/1.5.0/docs/api/java/util/jar/Pack200.html

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message