drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5071) CodeGenerator class unnecessarily keeps two copies of generated code
Date Sat, 26 Nov 2016 06:23:58 GMT
Paul Rogers created DRILL-5071:

             Summary: CodeGenerator class unnecessarily keeps two copies of generated code

                 Key: DRILL-5071
                 URL: https://issues.apache.org/jira/browse/DRILL-5071
             Project: Apache Drill
          Issue Type: Improvement
    Affects Versions: 1.8.0
            Reporter: Paul Rogers
            Priority: Minor

Drill uses a code cache to avoid recompiling the same code multiple times. The cache is keyed
on the generated code itself.

The generated code contains an ever-increasing name suffix of the form {{ProjectorGen123}}.

The unique name would be necessary if generated code shared a single name space. But, as currently
implemented, each bit of generated code resides in its own private class loader: the code
generated for one operator (say) can never class with that for another.

As a result, we can reduce the size and cost of the code cache by:

1. Eliminate the numeric suffix on the class name.
2. Eliminate the {{generifiedCode}} member variable in {{CodeGenerator}}.
3. Eliminate the search and replace that produces the "generified" code.
4. Use the actual generated code as the cache key instead of the "generified" version.
5. Rely on the distinct class loaders to keep generated class names separate.

The code cache holds up to 1000 classes. Classes can range from a few K to hundreds of K.
By eliminating the second code copy, we may reduce heap memory pressure on the order of 50K
* 1000 = 50 MB or so.

This message was sent by Atlassian JIRA

View raw message