db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kathey Marsden <kmarsdende...@sbcglobal.net>
Subject Re: Any ideas for debugging a byte code generation problem
Date Sun, 20 Mar 2005 08:39:34 GMT
Jeremy Boynes wrote:

> Kathey Marsden wrote:
>
>>
>> I could post my REALLY rough notes on my crash byte code generation
>> course with Dan if anyone wants to see them,  filled with holes and
>> probably as many untruths in the translation, but it is probably better
>> for Dan to post when he gets back.
>>
>
> Please post them, especially if Dan is going to be away for a few weeks.
>
> -- 
> Jeremy
>

OK. Here they are, kind of a stream of consciousness thing with notes
cut and pasted.  A  few issues.  

    1) As I mentioned I am actually working  on an older release of
Cloudscape so don't have the changes ported to Derby that I talk about
in the notes so can't post them here right now, but wanted to get as
much community input as possible as some sort of solution will need to
come to Derby.
    2) I also don't have a reproduction on Derby  but make your self a
super duper query with lots of subqueries and unions and you'll too
could get a java linkage error.
    3) My current outstanding issue is with fillResultSet being too
large.   Ideally it would be fixed centrally like Dan was able to do
with the constructor,  but it seems the state of the stack would prevent
that.

 I am thinking that my fix for fillResultSet will be to have the 
smarter statementNumHitLimit use the code size and possibly be used in
other places and make continuation methods like Dan did for the
constructor.  It would be much better to have it centralized but I don't
see how to do that right now for fill ResultSet.

Essentially this goes at the top of  statementNumHitLimit
int currentCodeSize = myCode.getRelativePC();
              
        if (currentCodeSize > (50000))
            return true;

        if (currentCodeSize < 5000 || statementNum < 5)
        {
            statementNum += noStatementsAdded;
            return false;
        }
   

-----------------------------------------------------------------------------------------------------------------------
Notes:     Please forgive the loosie goosieness of it all...

VM Spec - The Java Virtual Machine Specification - Tim Lindholm - Frank
Yellin
    ISBN 0-201-63453 -X

Constant Pool has 
    max 65535 entries
    12 types - The primatives + UTF8String
    long and double take up 2 slots.  Not clear why
    entries are given an index starting with 1 (0 reserved/not used?)
    class name is represented by UTF8 Constant
    index entry has pointer s to constant pool entry
    this.class -> index class_info[] -> UTF8
    each entry is a tag with data

If you have field  int x, you have
         UTF8 - "X"    // name lets say at index  77
            UTF8 - "I"    // type (integer) at index 83

You have
field_info
    access_flags  // e.g private/public
    name_index
    descriptor_index
    attributes

so there are 8 - 12 bytes for every field


Byte code for using variable
    getfield
    field_ref indexbyte1 ->index into class info index for class ??
              indexbyte2 -> Name and Type constant entry

    so the reference for x above points to 77 for the name
    and 83 for the type

There are att least 3 constant pool entries per variable   
   
String constants points  to string info which creates 2 entries
methods are string names concatonated so take up a lot of space
(everybody gets an org/apache/derby)

---------------------------------
ClassHolder is the entry point to an active class
addEntry shows where constant entries are added.
You can do a new Exception().printStackTrace() to see where constant
entries are coming from.

we don't create byte code for local variables so everything gets added
as a field.

java compiler ClassBuilder  ??? Don't know what I was writing here

----------------------------------------------
On first pass Dan saw a lot of constant pool entries getting
 created in this  optimization.  where we call DataValueFactory to reuse
DVD holder.
    x = DataValueFactory.getValue(expr, x)   // reuse holder
   
changing the second argument x to null a DataValueDescriptor for x  would
get newed up each time, but he would get past the the constant pool problem.
One issue is that even though x is really just a local variable never to
be used elsewhere it is defined as a field and just takes up space. 
Derby doesn't create local variables in it's byte code.


solution was to add a new method to ExpressionClassBuilder.
newFieldDeclarationOptional.  It is like newFieldDeclaration but intead
of alwayas returning a LocalField it returns null if we have passed some
reasonable limit  (2000) entries.  See reusableBoolean in
BinaryLogicalOperatorNode for sample usage.  Dan ran nist and found that
none of the queries exceeded this limit.

-----------------------------------------------------------------

VM is stack based.
    words on stack
    operations on the stack
    pushes values on stack
    pops them off the stack.
    push
        creates constant pool entry
        byte, index into pool  ???

So if  we have
    this.x(foo(), bar())
we essentially create temporary variables for the foo() and bar() return
valuesthat end up as constant pool entries.
replace temp variables with swaps.  My notes make no sense here, need to
complete

    we would push this
    call foo
    push this
    call bar
    <finish>


Dan's comment was this

"Remove a use of a generated local field where manipulating values on
the stack is sufficient.
    Reduces number of constant pool entries, a local field requires at
least three constant pool entries.
   
    Generated code used to be equivalent to
    // instance field
    <type> right;
   
    right = <right expr>;
    right.method(<left expr>, right);
   
    Now generated code uses the stack to store two copies of right
    rather than an instance field
"
----------------------------------------------------

Field and method names can safely overlap in Java so Dan's next step was
to share the namespace for expressions and fields.
We used to have
Expressions - take no args. return Object
    e#
    e0 - e9 are preset expressions
    e10 - en - other expressions
Fields
    f0 - fn
   
Other methods (argument methods)
    g0 - gn

Now the fields go in the e# name space.  The f# namespace is gone.

-----------------------------------------

Ultimately it would be good  to throw a Standard Exception when we reach
this point.  There is not really a mechanism to do that during the byte
code generation.
    For now  generate an IOException  in ClassFormatOutput stream with
    a Terse JVM Spec description of the limit problem.
    Ultimately catch it in  language and throw a StandardException
    with query too complex and chain this one.
    Checks are in writeXXX methods of ClasFormatOutput.java
   

---------------------------------------

Each class has execute and Constructor.  The Constructor is called once
when the Activation is created.  The actual work is done by post
Constructor.

we want to create a continuation constructor so.
public void postConstructor()
{
    // 238k worth of code
}

to

public void postConstructor()
{
    // 50k worth of code
   
    subConstructor0();
}

private void subConstructor0()
{
    // 50k worth of code
   
    subConstructor1();
}

private void subConstructor1()
{
    // 50k worth of code
   
    subConstructor2();
}


at endStatement is a good time to check if the  stack is empty and call
our continuation constructor if we need it.

About endStatement
    if we have x=3
    putField leaves a value on the stack.
    endStatement will pop it off


Note: There seemed to be an inital attempt to automatically break up the
constructors with the method statementNumHitLimit().  It needs to be
stmarter about the codesize instead of the number of statements and then
make continuation constructors based on that.

Tried to  make continuation constructor. 
    Code in ExpressionClassBuilder
    if (big) create another MethodBuilder with same signature.
    Make existing method call to new method
    complete old constructor
    set it up so acb.getConstructor returns our new constructor from now
on        So .. . from now on the get the new method and start writing
to it.
    Seems safe for the constructor.


This didn't work!  Because there were still some callers with references
to the constructor.  It was too risky to try to find them all.   Dan's
next solution was this...
"I think I have the generic split working (under the covers of
MethodBuilder)
for methods that take zero arguments and whose stack depth drops to zero
sometime after they reach 55000 code size.

This works for the constructor (postConstructor) but fillResultSet doesn't
trigger it, most likely as the stack is never 0.
Two changes

1) overflowMethodCheck method & calls in BCMethod.
     Checks to see if the method is getting too big and creates
a sub method as we did earlier today. But rather than being specific
it bases the method on the current method, modifiers and return type.

This method is called when the stack depth is 0, and only when the
stack depth can be zero, e.g. not called when pushing values onto the
stack as the depth cannot be zero, so only called when the stack depth
is being reduced.

It looks like what we did today, but once the current method has been
completed
the actual BCMethod takes on the identity of the new submethod. Thus leaving
the callers unaware of any change (same reference will now add code into
the sub method).

1a) putFieldPop() in MethodBuild/BCMethod. Does not leave the value of
the field on the stack. Reduces code size by not duplicatiing the value
of the field on the stack only to pop it later with an endStatement by
the caller.
Was about 8% of the code in postConstructor.


2) Calls to putFieldPop(). basically replacing

putField();
endStatement();

sequences with

putFieldPop();
// endStatement();
"


--------------------------------------------------------------------
There is one 236K method left.

It comes forom the fillResultSet method.
It returns a value so it might be tricky.
If it has 0 args as before.
With one arg, you have to create a new method with a param of that type.
e.g
ResultSet fillResultSet()



if you have 0 or 1 arg on the stack it is somewhat easy, 0 args as
before, 1 arg means you have to create
a new method with a param of that type
e.g.
ResultSet fillResultSet()
{
  // much code
// escape to avoid limit
return FRS_0(arg);
}
private FRS_0(<type that happened to be on stack>)
{
 etc.
}
FRS_0 would return a ResultSet


---- One more change and note from Dan
"Has the postConstructor() split working automatically, actually
any method that gets big (55,000 byte code size), has no parameters
and whose stack depth drops to zero after reaching 55,000 bytes.

fillResultSet does not fall into this category.

It might be worth changing the stackDepth check on the calls to the
overflow method in BCMEthod to be <= 1 rather than == 0. Then in the
method check that stack depth is 0 after the 55000 check.

Then if the stack depth is not zero and size > 55000 then print the
stack depth
to see if fillResultSet ever gets to a 1 stack depth. Ie. while running
the query.

If it it does then you need to modify the overflow
method to.

get the declared type of the top (only) stack word, say <stype>

make the sub method have a one parameter of <stype>

then just call the sub-type method as it currently done, BUT
perform a swap after the pushing of this, and then set 1 arg passed, not 0.
"

I performed the indicated checks and it doesn't meet the prerquisites.



       










Mime
View raw message