harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sian January" <sianjanu...@googlemail.com>
Subject Re: [classlib][pack200] Decoupling I/O and processing for unpacking scenario
Date Fri, 18 Jul 2008 13:11:19 GMT
I think when there are multiple segments in a pack200 archive it is fairly
easy to get some parallelisation in the way that Aleksey has shown, and
there's some control over this because you can specify on the command line
the maximum size for a segment, and so control how many segments are
created.  The downside of this (and of using multiple jar files) is that the
.pack file(s) will be larger on disk, but it's up to users how much of this
cost they want to pay to get improved performance and decreased memory usage
during the unpack process.

If people want to use multiple jar files that's certainly an option, and
would probably provide the best performance boost, but we shouldn't assume
everyone will do this.  Since we're behind Sun's performance at the moment
anyway I do think it's worth doing this work.  In fact even with multiple
jars unpacked in parallel it could be somewhat beneficial unless all the
jars were < 10MB.

Just my 2p worth...


On 18/07/2008, Alex Blewitt <alex.blewitt@gmail.com> wrote:
> Right, the pack200 structure is in ok way fixed size, so you have to read
> through to determine what they are. And given everything uses "current
> pointer" mentality, it would be difficult to parallelise generally.
> The best way to parallelise a pack200 process is to generate multiple
> packed Jars, and then extract each one in an independent thread. Most large
> systems that would benefit from this would probably do that in any case.
> Alex
> Sent from my (new) iPhone
> On 18 Jul 2008, at 10:16, "Sian January" <sianjanuary@googlemail.com>
> wrote:
> Hi Aleksey,
>> That's a really interesting idea.  It doesn't sound like a very
>> complicated
>> threading scenario, so I would think you could just use java.lang.Thread
>> rather than adding a dependency on the concurrent module.
>> I think there needs to be some processing on the read stage, because the
>> length of some bands depends on the contents of previous bands so you
>> won't
>> know how much to read unless you do some processing.  But some things can
>> be
>> done afterwards like sorting the constant pool, so there's definitely an
>> opportunity for parallelism there.
>> I haven't looked at your patch yet, but I will try to review it soon.
>> Thanks,
>> Sian
>> On 17/07/2008, Aleksey Shipilev <aleksey.shipilev@gmail.com> wrote:
>>> Hi, Sian, Andrew,
>>> I had decoupled the I/O and processing for unpacking scenario [1]. The
>>> bottom-line for this is to get rid from essentially serial I/O
>>> operations as much as possible, thus decreasing the amount of serial
>>> code in pack200 and opening the way for parallelism.
>>> The stage measurements for the first prototype are (msecs):
>>> read=6737 process=26724 write=2537
>>> That is, 6.7 secs is spent on reading, 2.5 secs on writing, 26.7 secs
>>> on processing. Keeping in mind that each segment traverses all three
>>> actions exactly once, we can see that processing of average segment is
>>> 4x slower than reading/writing. That mean, you could spawn 1 reader
>>> thread, 1 writer thread, 4 processing threads and have an equilibrium
>>> in producer-consumer scheme. In case of ideal scaling, it would
>>> decrease the scenario timing down to (6.7 + 2.5 + 26.7/4) = 15.8 secs,
>>> giving +70% boost.
>>> Exact mechanism of such paralleling is not so clear for me yet. Can we
>>> take the j.u.concurrent as the dependency?
>>> Another issue is: there still processing on read stage, because of
>>> mind-boggling dependencies I can't eliminate in this version. If we'll
>>> manage to decrease reading timings at least twice, the unpacking
>>> scenario timing will drop by (6.7/2 + 2.5 + [26.7+6.7/2]/4) = 13.3
>>> secs, giving +100% boost. Ahmdal's Law, eh :)
>>> Thanks,
>>> Aleksey.
>>> [1] https://issues.apache.org/jira/browse/HARMONY-5916
>> --
>> Unless stated otherwise above:
>> IBM United Kingdom Limited - Registered in England and Wales with number
>> 741598.
>> Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message