harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sian January" <sianjanu...@googlemail.com>
Subject Re: [classlib][pack200] Interested in Pack200
Date Fri, 31 Aug 2007 09:15:25 GMT
Thanks Alex - that's really helpful.  It looks like there's a lot to get my
head around for now, but I may take you up on the offer of picking your
brains when I get a bit further with understanding it.

Sian


On 31/08/2007, Alex Blewitt <alex.blewitt@gmail.com> wrote:
>
> > Thanks very much for your reply.  I hadn't found HARMONY-3290, so I will
> > have a look at that.  Do you happen to remember what the main changes
> were
> > that you made for EclipseCon?
> >
> > I would definitely be interesting in talking about the current
> > implementation and getting up to speed with it.  I've read most of your
> past
> > e-mails and had a look at some of the code, so that's where I am at the
> > moment.
>
> The state of play in the Harmony codebase at the moment was that I
> hadn't got around to decoding the bytecode stored in the pack200 file,
> so at that point, I could extract interfaces and fully abstract
> classes (e.g. those with native parts) but nothing that had any code
> or initialisation (e.g. constant expressions or method calls). I
> started putting together something to represent the bytecode/class
> structure to handle it in the .bytecode. package, but in the dump in
> 3290 I added something which helped to decode some of the bytecode
> instructions themselves.
>
> IIRC the bytecode fields are stored as a variable-sized byte array at
> the end of the segment, and you essentially iterate over them (with
> 0x0 terminators? or was it 0xff?), one for each non-abstract member in
> the code. The difference with them is that the bytecode sequence
> doesn't have any argument values; instead, they're references into the
> appropriate constant pool. So 'ldc 5' would actually mean load
> constant pool reference 5, which might turn out to be a string or
> something. Secondly, whilst Java bytecode instructions are weakly
> typed, they're strongly typed in the packed bytecode, so a load of an
> int is different from load as a double, because they come from
> different locations in the segment's constant pools. Thus there's a
> mapping such that (say) 486,586 and 686 all map to the instruction
> '86' but with different arg types. (The numbers are different and are
> in the pack200 spec; I forget exactly what they are, but that's the
> idea).
>
> In addition, some common constructs are condensed into a single byte.
> So the default constructor super() is usually init(), which is usually
> represented as 'aload_0, invokespecial n' where n is the entry
> java.lang.Object#<init> or some such. That gets boiled down to a
> single code (231?) and so when decoding, you not only replace '238'
> with the codes for aload_0/invokespecial, but you also potentially
> have to infer the method/object reference for the superclass'
> constructor as well.
>
> In the EclipseCon demo, I bodged the ability to put the <init> in
> place whilst the bytecode was being extracted:
>
> --- 8< ---
>        protected ClassFileEntry[] getNestedClassFileEntries() {
>                if (opcode == 231) // TODO HACK
>                        return new ClassFileEntry[] { new
> CPMethodRef("java/lang
> /Object",
>                                        "<init>:()V") };
>                else
>                        return nested;
>        }
> --- 8< ---
>
> Clearly, that wouldn't work when the class wasn't a direct subtype of
> java.lang.Object or had different arguments.
>
> Once the bytecode extraction is done, then looking at the exception
> handlers is probably the next thing that would make it slightly
> useful. The debug symbols are also not handled, and nor are any of the
> annotation code that's used by the Java 5 stuff.
>
> I seem to recall that when I was working out the parsing of a simple
> class, I had an off-by-one error in the number of bytes that the
> packed file contained versus what I was expecting. I didn't get that
> when I had interfaces. I never really found out what the solution was
> for that one :-(
>
> I don't know if this gives you any more of an idea where the state of
> play is, but if you were to compile/pack the following:
>
> --- 8< ---
> public interface Foo {
> public void abstract foo();
> }
> --- 8< ---
> and then pack it, it should be possible to extract the contents with
> the current implementation. That would be a start finding out where
> the code paths lie and what's going on. You'll need to compile with
> debug symbols disabled (i.e. javac -g:none) and I can't remember
> whether the current simple implementation assumes the pack file isn't
> GZIpped, or whether I'd fixed that. (By default, the Sun pack200 tool
> will auto-gzip the pack200 output.) The next stage would be to get:
>
> --- 8< ---
> public class Foo {
> public Foo() {
>    super();
> }
> pubiic void abstact foo();
> }
> --- 8< ---
>
> working, since that will contain the implicit call to the constructor.
> The remainder of the bytecodes are either going to have no args (e.g.
> 'rtn') or some args (e.g. 'getstatic') and the ones with args will
> need to be mapped to the appropriate pool entry. If I recall, the arg
> values are specific to the per-class pool, rather than the global
> pool, but you'd have to re-read the spec to know for sure. Once that's
> done, you might be able to start decoding more interesting classes
> and/or have ones with 'try/catch' in place.
>
> BTW the code in Segment is ugly and could certainly use a good dose of
> refactoring; and I'm not sure that the flyweight pattern in the
> ByteCode was doing much good. To be honest, the biggest problem I had
> when decoding the bytecode packed values was how much size to allocate
> for the resulting stream, and where to fill the values from. I suspect
> rather than attempting to do it in one pass (like I did) it might be
> better to do a multi-pass, first extracting the real bytecodes (and
> any extra additions to the constant pool) and then afterwards post
> filling the argument values in. There's also the knotty problem that
> the bytecode pool that should get written to the output .class should
> be sorted using some fairly weird sorting rules (see cp.resolve() in
> buildClassFile of Segment.java) that will affect how the values get
> written to the final .class file. It doesn't make any difference from
> an execution perspective, but the pack200 spec is clear that they need
> to be sorted to a canonical order such that any signatures of the
> files will result in the same binary structure of the class file.
>
> That's your starter ... you might want to download the snapshot I made
> for the bug and/or commit some of it; it was ugly, and had some hacks,
> but it never really got worked on post EclipseCon so it might be a
> better place to start from.
>
> By the way, the pack200 spec is mind bending enough the first few
> hundred times you read it. If you want to pick my brains on how
> something works, feel free to drop me a line and I'll see if I can
> help out.
>
> Alex.
>



-- 
Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message