harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexei Fedotov" <alexei.fedo...@gmail.com>
Subject [classlib][archive] Limit a jar entry size to ~50 Mb Was: java.util.jar specialists/authors wanted to clarify manifest chunks
Date Thu, 28 Feb 2008 16:41:05 GMT
Hello, Pavel, Alexey, Alexei,
After some thinking I decided to limit a MANIFEST.MF entry size to 50
Mb (~ the maximum size of java array). This would lead to code
rewrites if we would like to change this limitations, and this is not
exactly the thing Pavel suggested. Here is rationale.

* The current code lays the same limitations on any entry size, and
much stronger limitations in close areas such as entry order of
appearance in JarInputStream.
* Generally, a manifest should be kept in memory until signatures are read.
* Our GC can be improved to get x30 maximum byte array size increase
without java code modifications.
* Someone could complete implementation of LbaInputStream object [1]
and straigtforwardly replace my byte array with it if the necessity
came to support maximum 4Gb size. I don't think the complex code
should be added without a good reason.
* When we got a reason, the next step would be easier to understand
than now. For example, this would make some difference whether the big
manifest comes from a jar file (and can be reread for verification
purposes later) or is fetched from a jar input stream once (... and
may be would be so big that nothing except caching on a tape would
help).

[1] http://issues.apache.org/jira/secure/attachment/12376739/LbaInputStream.java

Please speak up if you are thinking that the rationale is not sufficient.

On Thu, Feb 21, 2008 at 10:06 AM, Alexey Varlamov
<alexey.v.varlamov@gmail.com> wrote:
> 2008/2/20, Alexei Zakharov <alexei.zakharov@gmail.com>:
>
> > Pavel,
>  >
>  > Have you ever seen the jar of such size? Or ever close to it?
>  > Well, I also agree we should kept them in mind. But if we can really
>  > speed up processing of small jars lets do it.
>
>  Just for the record, I had to move a BTI installation to other host a
>  few months ago and it took a few Gigs zipped. Anyway there's nothing
>  unusual in huge files nowadays. So I second Pavel here: while
>  optimizing for everyday usecases we should still keep a path for
>  handling valid corner cases.
>
>  --
>  Alexey
>
>
>
>  >
>  > Regards,
>  > Alexei
>  >
>  > 2008/2/20, Pavel Pervov <pmcfirst@gmail.com>:
>  > > Alexei,
>  > >
>  > > I generally agree with Alexei Z, but large zip entries should be kept
>  > > in mind while implementing current optimizations to java.util.jar, so
>  > > it wouldn't lead to rewriting the code again when faced with large
>  > > entries.
>  > >
>  > > WBR,
>  > >     Pavel.
>  > >
>  > > On 2/20/08, Alexei Fedotov <alexei.fedotov@gmail.com> wrote:
>  > > > Alexei,
>  > > > Thanks for sharing your opinion! Let me note that I mistakenly said
>  > > > about 4GB. Actually the maximum size of uncompressed entry is limited
>  > > > by 2GB (Integer.MAX_VALUE).
>  > > >
>  > > > Any other votes?
>  > > >
>  > > > On Feb 20, 2008 12:19 PM, Alexei Zakharov <alexei.zakharov@gmail.com>
wrote:
>  > > > > Hi Alexei,
>  > > > >
>  > > > > I don't think we should really care about such a huge zip files
now.
>  > > > > Especially in case if this assumption that our zip file is less
than
>  > > > > 4Gb can give us performance benefits. IMO it is enough just to file
a
>  > > > > low-pririty JIRA (something like "Harmony can't deal with 16Gb zip
>  > > > > files") and continue optimizations having in mind we will never
met
>  > > > > zip files more than 4Gb in size.
>  > > > >
>  > > > > Regards,
>  > > > > Alexei
>  > > > >
>  > > > > 2008/2/19, Alexei Fedotov <alexei.fedotov@gmail.com>:
>  > > > >
>  > > > > > Hello folks,
>  > > > > >
>  > > > > > Let me continue with my questions about our archive implementation.
I
>  > > > > > have noticed that our zip input stream is constructed as follows:
>  > > > > >
>  > > > > >         byte[] buf = inflateEntryImpl2(descriptor, entry.getName());
>  > > > > >         return new ByteArrayInputStream(buf);
>  > > > > >
>  > > > > > Does it mean that we strategically want to work with zip entries
less
>  > > > > > than 4Gb? This would allow specific optimizations using underlying
>  > > > > > byte buffer array. Or is it just a bug, and strategically we
want to
>  > > > > > handle as big entries as specified in zip file format?
>  > > > > >
>  > > > > > Thank you for sharing your opinion.
>  > > > > > Alexei
>  > > > > >
>  > > > > >
>  > > > > >
>  > > > > > On Feb 17, 2008 4:46 PM, Alexei Fedotov <alexei.fedotov@gmail.com>
wrote:
>  > > > > > > Thanks Tim for taking care of the patch! I got another
question about
>  > > > > > > this module. Accoroding to specification, attributes of
individual
>  > > > > > > entry sections for the same entry name should be merged.
Which bytes
>  > > > > > > should be checked for a digital digest of this merged
entry?
>  > > > > > >
>  > > > > > > Thanks!
>  > > > > > >
>  > > > > > >
>  > > > > > > On Feb 15, 2008 3:52 PM, Alexei Fedotov <alexei.fedotov@gmail.com>
wrote:
>  > > > > > > > Hello folks,
>  > > > > > > >
>  > > > > > > > Alexey Zakharov kindly shared a hint with me that
shorter letters have
>  > > > > > > > a better chance of being read. That is why I prepared
a shorter letter
>  > > > > > > > asking again about manifest encodings in a form of
patch, see
>  > > > > > > > HARMONY-5517.
>  > > > > > > >
>  > > > > > > > I really appreciate if people who touched the code
before me (Nathan,
>  > > > > > > > Tim, or Evgeniya) would take a look.
>  > > > > > > > Thank you in advance.
>  > > > > > > >
>  > > > > > > > [1] http://issues.apache.org/jira/browse/HARMONY-5517
>  > > > > > > >
>  > > > > > > >
>  > > > > > > > On Thu, Feb 14, 2008 at 2:15 PM, Alexei Fedotov
>  > > > > > > >
>  > > > > > > > <alexei.fedotov@gmail.com> wrote:
>  > > > > > > > > Hello, Nathan,
>  > > > > > > > >  Thanks for your interest. I'm trying to resolve
a performance problem
>  > > > > > > > >  described at HARMONY-4569. Gregory mentions
that methods write() from
>  > > > > > > > >  nextChunk() are called too many times, see
lines 187, 201 of
>  > > > > > > > >  working_classlib/modules/archive/src/main/java/java/util/jar/InitManifest.java
>  > > > > > > > >  This slows down Harmony VM in debug and interpreter
modes and may
>  > > > > > > > >  affect overall Eclipse startup since many jars
are read in the
>  > > > > > > > >  process. I'm trying to collect more data.
>  > > > > > > > >
>  > > > > > > > >  As far as I was able to advance reviewing the
complex code it seemed
>  > > > > > > > >  that either code or my understanding may be
improved.
>  > > > > > > > >   * "chunks" hash table is used only for jar
verification. Do we need
>  > > > > > > > >  to initialize it for any manifest when this
cost us much invocations?
>  > > > > > > > >  Instead of using write() methods for creating
chunks one may think of
>  > > > > > > > >  remembering chunk positions in the stream,
which should be read into
>  > > > > > > > >  byte array using big buffers instead of individual
writes.
>  > > > > > > > >   * It seems that manifests longer than 1024
characters may result in
>  > > > > > > > >  "string too long" exception - the buffer they
are read in just gets as
>  > > > > > > > >  much characters from stream as possible, and
reports error if the
>  > > > > > > > >  stream is not read fully.
>  > > > > > > > >   * I don't know a reason why manifests are
read in different
>  > > > > > > > >  encodings. The spec [1] mentions UTF-8 only.
Nice to know.
>  > > > > > > > >   * Close functionality of readLines and nextChunk
containing long
>  > > > > > > > >  conditional sequences may be rewritten in more
transparent and
>  > > > > > > > >  documented way. Generally idea behind "rewriting"
of chunks is above
>  > > > > > > > >  of my understanding: I have not noticed in
the specification that line
>  > > > > > > > >  breaks or anything else should be "rewritten"
using eight-if algorithm
>  > > > > > > > >  instead of taken as is. BTW, I have noticed
that Tim was behind
>  > > > > > > > >  readability improvements of the code. I wonder
what was there before
>  > > > > > > > >  and will check it after lunch.
>  > > > > > > > >   * The whole class InitManifest seems to be
redundant and may be
>  > > > > > > > >  replaced with a set of static methods. It seems
that specific
>  > > > > > > > >  functionality for two calls to InitManifest
should be kept in the
>  > > > > > > > >  place where InitManifest is called rather than
passed to InitManifest
>  > > > > > > > >  as a parameter for internal check.
>  > > > > > > > >
>  > > > > > > > >  I appreciate your comments and help.
>  > > > > > > > >
>  > > > > > > > >  [1] http://java.sun.com/j2se/1.5.0/docs/guide/jar/jar.html
>  > > > > > > > >
>  > > > > > > > >
>  > > > > > > > >
>  > > > > > > > >  On Feb 14, 2008 6:00 AM, Nathan Beyer <ndbeyer@apache.org>
wrote:
>  > > > > > > > >  > Can you point out the painful bits (line
numbers, etc)?
>  > > > > > > > >  >
>  > > > > > > > >  >
>  > > > > > > > >  > On Feb 13, 2008 11:01 AM, Alexei Fedotov
<alexei.fedotov@gmail.com> wrote:
>  > > > > > > > >  > > Hello folks,
>  > > > > > > > >  > >
>  > > > > > > > >  > > Do we have original
>  > > > > > > > >  > > working_classlib/modules/archive/src/main/java/java/util/jar/
module
>  > > > > > > > >  > > contributors on board? Could anyone
clarify the reasons behind heavy
>  > > > > > > > >  > > solution to copy manifest chunks
into a separate hash table descried
>  > > > > > > > >  > > at HARMONY-4569? Aren't entity hash
table the only object which should
>  > > > > > > > >  > > be populated?
>  > > > > > > > >  > >
>  > > > > > > > >  > > --
>  > > > > > > > >  > > With best regards,
>  > > > > > > > >  > > Alexei
>  > > > > > > > >  > >
>  > > > > > > > >  > > [1] http://issues.apache.org/jira/browse/HARMONY-4569
>  > > > > > > > >  > >
>  > > > > > > > >  >
>  > > > > > > > >
>  > > > > > > > >
>  > > > > > > > >
>  > > > > > > > >  --
>  > > > > > > > >  With best regards,
>  > > > > > > > >  Alexei
>  > > > > > > > >
>  > > > > > > >
>  > > > > > > >
>  > > > > > > >
>  > > > > > > > --
>  > > > > > > > With best regards,
>  > > > > > > > Alexei
>  > > > > > > >
>  > > > > > >
>  > > > > > >
>  > > > > > >
>  > > > > > > --
>  > > > > > > With best regards,
>  > > > > > > Alexei
>  > > > > > >
>  > > > > >
>  > > > > >
>  > > > > >
>  > > > > > --
>  > > > > > With best regards,
>  > > > > > Alexei
>  > > > > >
>  > > > >
>  > > >
>  > > >
>  > > >
>  > > > --
>  > > > With best regards,
>  > > > Alexei
>  > > >
>  > >
>  > >
>  > > --
>  > > Pavel Pervov,
>  > > Intel Enterprise Solutions Software Division
>  > >
>  >
>



-- 
With best regards,
Alexei

Mime
View raw message