Return-Path: Delivered-To: apmail-harmony-commits-archive@www.apache.org Received: (qmail 52977 invoked from network); 13 Mar 2007 12:18:21 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Mar 2007 12:18:21 -0000 Received: (qmail 90788 invoked by uid 500); 13 Mar 2007 12:18:28 -0000 Delivered-To: apmail-harmony-commits-archive@harmony.apache.org Received: (qmail 90762 invoked by uid 500); 13 Mar 2007 12:18:28 -0000 Mailing-List: contact commits-help@harmony.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@harmony.apache.org Delivered-To: mailing list commits@harmony.apache.org Received: (qmail 90751 invoked by uid 99); 13 Mar 2007 12:18:28 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Mar 2007 05:18:28 -0700 X-ASF-Spam-Status: No, hits=-99.5 required=10.0 tests=ALL_TRUSTED,NO_REAL_NAME X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO eris.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Mar 2007 05:18:12 -0700 Received: by eris.apache.org (Postfix, from userid 65534) id 515D51A984E; Tue, 13 Mar 2007 05:17:52 -0700 (PDT) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r517663 [5/14] - in /harmony/standard/site: docs/ docs/documentation/ docs/subcomponents/classlibrary/ docs/subcomponents/drlvm/ xdocs/ xdocs/documentation/ xdocs/stylesheets/ xdocs/subcomponents/classlibrary/ xdocs/subcomponents/drlvm/ Date: Tue, 13 Mar 2007 12:17:45 -0000 To: commits@harmony.apache.org From: nadinem@apache.org X-Mailer: svnmailer-1.1.0 Message-Id: <20070313121752.515D51A984E@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Modified: harmony/standard/site/docs/subcomponents/drlvm/JIT.html URL: http://svn.apache.org/viewvc/harmony/standard/site/docs/subcomponents/drlvm/JIT.html?view=diff&rev=517663&r1=517662&r2=517663 ============================================================================== --- harmony/standard/site/docs/subcomponents/drlvm/JIT.html (original) +++ harmony/standard/site/docs/subcomponents/drlvm/JIT.html Tue Mar 13 05:17:43 2007 @@ -1,2778 +1,2778 @@ - - - - - - - - - - - - - - - - - - - - - - - - Apache Harmony - Jitrino Just-in-Time Compiler - - - - - - - - - - - - - -

- - -

-

-

-

-

- -

ApacheCon Europe 2007

-

- - - - - - -

-

-

- - - - - - - - -

General

-

-

Community

-

Get Involved -
Who we are -
Mailing Lists -
Bug Tracker -
Other Projects -

-

Development

-

Source Code -
Getting Started -
Project Roadmap -
Resolution Guideline -
Performance -

-

Documentation

-

-

- -

- - - - - - - - DRLVM Jitrino Just-in-time Compiler - - - -

- DRLVM Jitrino Just-in-time Compiler -

-

- Revision History -

-

- 1. About this Document -

-

- 1.1 Purpose -

-

- 1.2 Intended Audience -

-

- 1.3 Using This Document -

-

- 1.4 Conventions and Symbols -

-

- 2. Overview -

-

- 2.1 Key Features -

-

- 2.2 About Compilation -

-

- 3. Jitrino.OPT -

-

- 3.1 Architecture -

-

-
- 3.1.1 Pipeline Management Framework -
-
- 3.1.2 Logical Components -
-
- 3.1.3 Intermediate Representations -
-

-

- 3.2 Processes -

-

-
- 3.2.1 Bytecode Translation -
-
- 3.2.2 High-level Optimizations -
-
- 3.2.3 Code Selection -
-
- 3.2.4 Code generation -
-

-

- 4. Jitrino.JET -

-

- 4.1 Architecture -

-

-
- 4.1.1 Run-time Support -
-

-

- 4.2 Processes -

-

-
- 4.2.1 Baseline Compilation -
-

-

- 5. Utilities -

-

- 5.1 Memory Manager -

-

- 5.2 Counters and Timers -

-

- 5.3 Logging -

-

- 5.4 Control Flow Graph -

-

-
- 5.4.1 CFG structures -
-
- 5.4.2 Graph algorithms -
-
- 5.4.3 Dominator Tree -
-
- 5.4.4 Loop Tree -
-

-

- 6. Public Interfaces -

-

- 5.1 JIT_VM Interface -

-

- 5.2 JIT_EM Interface -

-

- 6. References -

-

- Revision History -

- - - - - - - - - - - -

- Version -	- Version Information -	- Date -
- Initial version -	- Intel, Nadya Morozova: document created. -	- September 4, 2006 -

-

- 1. About - this document -

-

- 1.1 Purpose -

-

- This document describes the internal structure of the Jitrino - just-in-time compiler deployed with the virtual machine as part of the - DRL (Dynamic Runtime Layer) initiative. The description covers the - internal design of this JIT compiler and its interaction with other - DRLVM components. In this document, you can find - implementation-specific details of the Jitrino compiler. General - information on the JIT role in overall virtual machine design and - VM-level requirements are out of scope of this document and are - covered in the DRLVM Developer's - Guide supplied with the VM source code package. -

-

- 1.2 Intended - Audience -

-

- The document is targeted at DRLVM developers with special interest in - code compilation algorithms. The information can be helpful for future - development of DRL compilation techniques and can serve as an example - for those implementing a JIT compiler from scratch. The document - assumes that readers understand the concepts of just-in-time - compilation and optimization algorithms. -

-

- 1.3 Using - This Document -

-

- The DRLVM just-in-time compiler description has the following major - sections: -

-

- Overview: a definition of the JIT compiler - component and its key features -
- Jitrino.OPT: -
- - Architecture: a description - of the Jitrino.OPT internal architecture, its subcomponents - and the interfaces it uses, as well as other - implementation-specific data -
- - Processes: an overview and a - step-by-step description of compilation, including - optimizations and code generation -
-
- Jitrino.JET: -
- - Architecture: a description - of the Jitrino.OPT internal architecture, its subcomponents - and the interfaces it uses, as well as other - implementation-specific data -
- - Processes: an overview and a - step-by-step description of baseline compilation -
-
- Utilities: a description of - JIT-specific utilities, such as the control flow graph, the timers, - and the memory manager -
- Public interfaces: a definition of major - functional groups that the JIT compiler exports for interaction - with other components -

-

- 1.4 - Conventions and Symbols -

-

- This document uses the unified - conventions for the DRL documentation kit. -

-

- Back to Top -

-

- 2. Overview -

-

- Jitrino is the code name for the just-in-time (JIT) compiler [2] currently shipped with DRLVM. Jitrino - comprises two distinct JIT compilers that share source code and are - packaged in a single library: -

-

- Jitrino.JET baseline compiler translates Java* bytecode into native code with practically no - optimizations. The compiler emulates operations of the stack-based - machine using a combination of the native stack and registers. -
- Jitrino.OPT optimizing compiler can - performs bytecode compilation with a variety of optimizations. The - compiler features two types of code intermediate - representation (IR): platform-independent high-level IR (HIR) - and platform-dependent low-level IR (LIR). Jitrino.OPT incorporates - an extensive set of code optimizations for each IR type. This JIT - compiler has a distinct internal interface between the bytecode - translator operating on HIR and the code generator operating on - LIR. This enables easy re-targeting of Jitrino to different - processors and preserving all the optimizations done at the HIR - level. -

-

- This document describes both compilers and their operation. All - references to Jitrino with no subtitle (JET or OPT) specified equally - apply to both compilers. -

-

- Back to Top -

-

- 2.1 Key features -

-

- Key features of the JIT compiler include: -

-

- A clear interface to plug in different front-end components -
- A clear interface to plug in code generators for different - platforms -
- High configurability via command-line options and a configuration - file -

-

- Jitrino.OPT also features the following capabilities: -

-

- A two-level intermediate representation with most optimizations run - at the platform-independent high level -
- The ability to plug in new optimization passes at both intermediate - representation levels -
- A flexible logging system enables tracing of major Jitrino - activities, including detailed IR dumps during compilation and - run-time interaction with other DRL components on the per-thread - basis, as well as gathering execution time statistics of compiler - code at a very fine granularity level -
- Configurable self-check capabilities to facilitate bug detection -

-

- Back to Top -

-

- 2.2 About - the Compilation Process -

-

- Jitrino compilers provide means to compile and optimize code - distributed for Java* run-time environments and to - adapt it to various hardware architectures. Figure 1 demonstrates the - architecture of the compilers and their interaction with the virtual machine. -

-

- Both Jitrino.JET and Jitrino.OPT compilers have a platform-independent - Java front-end and a platform-dependent back-end. Compilation connects - these and propagates type information extracted by the front-end from - the original bytecode to the platform-specific back-ends. Supporting a - new hardware platform requires implementation of a new - platform-dependent back-end. -

-

- Jitrino can follow different code compilation strategies. The - compilation process can be optimized for the smallest compilation - time, for the best performance or for a compromise of affordable - compilation time with reasonable performance. The compilation process - can involve the Jitrino.JET baseline compiler, Jitrino.OPT optimizing - compiler or both. In most applications, only a few methods consume the - majority of time at run time, so that overall performance benefits - when Jitrino aggressively optimizes these methods. The Execution Manager defines the actual compilation - strategy. -

-

- The Jitrino.JET baseline compiler provides the fastest compilation - time by translating Java* bytecode directly to native - code. This compiler performs a very fast and simple compilation and - applies almost no optimizations. -

-

- Jitrino.OPT, the main Jitrino compilation engine, provides the most - optimized native code by the cost of greater compilation time. The - compilation process of the Jitrino.OPT is also shown in Figure 1, with - focus on the following: -

-

-
- The run-time environment bytecode is translated into the - high-level intermediate representation (HIR) by the Java* bytecode translator and then optimized by the - high-level optimizer. HIR and the optimizer make up the - language- and platform-independent part of the Jitrino.OPT. -
-
- Note -
-
- The Jitrino architecture is modular, which facilitates - implementation of more front-ends, such as the Common Language - Infrastructure (CLI) bytecode front-end. -
-
- After optimization, a platform-specific code generator translates - HIR into the platform-specific low-level intermediate - representation (LIR). The code generator then performs - platform-specific optimizations and register allocation over LIR, - and finally emits native code. -

-

- This document describes the internal structure of the Jitrino.JET and - Jitrino.OPT compilers and the processes running inside them. -

-

- JIT Architecture -

-

- Figure 1. Jitrino Compiler Architecture -

-

- Back to Top -

-

- 3. Jitrino.OPT -

-

- 3.1 Architecture -

-

- This part of the document describes the internals of the optimizing - compiler Jitrino.OPT. -

-

- 3.1.1 Pipeline Management Framework -

-

- The pipeline management framework (PMF) defines how the compilation - process goes inside Jitrino.OPT. With PMF, the compilation process is - represented as a pipeline, which is a linear sequence of steps. - Each step stores a reference to an action object, its parameters and - other information. Actions represent independent - transformations of code, such as optimization passes. Different steps - in a pipeline can reference to the same action, for example, to run - the same transformation several times. Sequences of steps can vary - between pipelines. -

-

- To select a pipeline for compiling a given Java* method, the system uses method filters consisting of - class and method names and method signatures as the selection - criteria. Each JIT instance has one common pipeline with an empty - method filter that accepts all methods for compilation. Additionally, - optional pipelines with unique and non-empty filter expressions can be - created for compiling specific Java * - methods sets. -

-

- Pipelines in Jitrino.OPT are configured using the VM properties - mechanism. PMF parses properties, constructs pipelines and passes - parameters to actions. The OPT compiler has no hard-coded pipeline, so - you need to configure pipelines in EM configuration files or through - VM properties. Understanding pipeline configuration rules is required - for using the Jitrino command-line interface and effectively - exercising the Jitrino logging system. For details on PMF internals, - refer to the PMF Detailed Description. -

-

- Back to Top -

-

- 3.1.2 Logical - Components -

-

- This section defines the key parts of the compiler. This is only an - abstract, logical division matching the key compilation stages. Each - logical component includes action(s) that are used consecutively in - compilation pipelines. -

-

- Translator -

-

- The bytecode translator is responsible for converting incoming - bytecode instructions into a high-level intermediate - representation. This IR is of a lower level than the bytecode - and breaks complex bytecode operations into several simple - instructions to expose more opportunities to later high-level - optimization phases. For example, loading an object field is - broken up into operations that perform a null check of the - object reference, load the base address of the object, compute - the address of the field, and load the value at that computed - address. -

-

- For details on the conversion process, see section 3.2.1 Bytecode Translation. -

-

- Optimizer -

-

- The optimizer includes a set of optimizations independent of the - original Java* bytecode and the hardware - architecture. A single optimization framework for Java* and CLI programs is used. The optimizer performs - a series of transformation passes to optimize the incoming - high-level intermediate representation. For a description of - applied transformations, see section 3.2.2 High-level Optimizations. -

-

- After the - high-level optimizations (HLO) are applied, the code - selector translates the high-level intermediate - representation to a low-level intermediate representation. The - component is designed so that code generators for different - architectures can be plugged into the compiler. To be pluggable, - a code generator must implement code selector callback - interfaces for each structural entity of a method, such as the - whole method, basic blocks, and instructions. During code - selection, the selector uses the callback interfaces to - translate these entities from HIR to LIR. See section 3.2.3 Code Selection for details on - the process. -

-

- Code Generator -

-

- The code generator (CG) is responsible for generation of machine - code out of the input high-level intermediate representation. CG - accepts the HIR information via the code - selector callback interfaces. For details on how the - resulting code is produced, see section 3.2.4 Code Generation. -

-

- The code generator also performs several auxiliary operations, - such as: -

-

- Creation of a data area with constants used in code -
- Generation of auxiliary structures necessary for run-time - support, such as the stack layout description, the GC map and - registers -
- Registration of exception handlers in VM -
- Registration of direct calls for patching -

-

-

- Back to Top -

-

- 3.1.3 Internal Representations -

-

- An intermediate representation (IR) is an internal compiler - representation for code being compiled. Jitrino.JET has no - intermediate representation of code and directly compiles bytecode - into the native code. Jitrino.OPT uses two IR forms: the high-level - intermediate representation (HIR) and the low-level intermediate - representation (LIR). To compile a method's code, the Jitrino.OPT - compiler translates Java* bytecode into a graph-based - structure with nodes, edges and instructions. The nodes and edges in - the graph denote the control flow of the program. Every node in the - graph is populated with instructions that denote the primitive - operations. -

-

- Example -

-

- Here is an example of corresponding Java* code, - Java* bytecode and the low-level intermediate - representations used in Jitrino.OPT: -

-

- Java* code: -

-

-    public static int max(int x, int y) {
-        if (x > y) {
-            return x;
-        }
-        return y;
-    }
-

-

- Java* bytecode: -

-

-public static int max(int, int);
-  Code:
-   0:   iload_0
-   1:   iload_1
-   2:   if_icmple       7
-   5:   iload_0
-   6:   ireturn
-   7:   iload_1
-   8:   ireturn
-

-

- Jitrino high-level intermediate representation of code: -

-

- HIR representation of code - example -

-

- Jitrino low-level intermediate representation of code: -

-

- LIR representation of code - example -

-

- Both HIR and LIR use a common Control Flow Graph structures and its - algorithms; see section 5.4 Control Flow Graph for - the details. This section describes the two intermediate - representations currently used in Jitrino in greater detail. -

-

- Back to Top -

-

- High-Level IR -

-

- The Jitrino high-level intermediate representation (HIR) is a - platform-independent representation of the code being compiled. - In HIR, each basic block node consists of a list of - instructions, and each instruction includes an operator and a - set of operands. HIR supports a single static assignment (SSA) - form where each operand has exactly one assignment. The SSA form - provides explicit use-def links between operands and their - defining instructions, which simplifies and speeds up high-level - optimizations. Each HIR instruction and each operand have - detailed type information propagated to the back-end at further - compilation stages. -

-

- The compiler also maintains dominator and loop structure information on HIR for use - in optimization and code generation. -

-

-

- Low-level IR -: -
- Jitrino low-level intermediate representations (LIR) are - specific for code generators implementing them. The specifics of - the Jitrino IA-32/Intel® 64 CG LIR is that unlike HIR, it - does not support SSA form and is designed to be very close to - the IA-32 and Intel® 64 architectures. -
-

-

- Back to Top -

-

- 3.2 Processes -

-

- This part of the document describes the key processes that go inside - the Jitrino optimizing compiler. -

-

- 3.2.1 - Bytecode Translation -

-

- The initial compilation step is the translation of bytecode into HIR, - which goes in the following phases: -

-

- The bytecode translator establishes the basic block boundaries and - exception handling regions, and infers type information for - variables and operators. At this phase, the translator generates - type information for variables and virtual Java* - stack locations, similarly to the bytecode verification algorithm - described in the JVM specification [1]. -
- The bytecode translator generates HIR and performs simple - optimizations, including constant and copy propagation, folding, - devirtualization and in-lining of method calls, elimination of - redundant class initialization checks, and value numbering-based - redundancy elimination [3]. -

-

- 3.2.2 - High-level Optimizations -

-

- High-level optimizations are platform-independent transformations - performed by the optimizer. The optimizer applies a set of classical - object-oriented optimizations balancing the effectiveness of - optimizations with their compilation time. Every high-level - optimization is represented as a separate transformation pass over - HIR. Each Jitrino.OPT optimization aims at one or more goals, as - follows: -

-

- Scope enhancement extends the scope of other - optimizations. -
- Redundancy elimination gets rid of - redundant operations, which can be removed without changing code - semantics. -
- HIR simplification cleans up - the intermediate representation between passes. -

-

- Optimization Modes -

-

- The Jitrino high-level optimizer supports various optimization modes, - which differ by the optimization path and profile used to optimize the - code. Different optimization modes are customized for different - application types: client applications usually require fast startup - time and reasonable response time, whereas server applications require - top-level performance in the long run. A particular optimization mode - is defined by the following: -

-

- The optimization path - a part of the compilation pipeline - representing the set of optimization passes performed during - compilation of a method. The default optimization path set in the - configuration file is based on Java* properties. - You can override the default settings in the configuration file or - on the command line. -
- The profile - a static or a dynamic profile that a JIT compiler can - generate and use. Currently, Jitrino.JET provides the method - hotness/backedge profile, and Jitrino.OPT provides the edge - profile. For a description of profiles and profile-related - processes, see the EM component description. -

-

- Several pre-defined Jitrino optimization modes are stored in the execution manager configuration files, as - follows: -

-

- The client mode uses the dynamic method hotness/backedge - profile and is tuned for fast application startup and reasonable - performance. This is the default optimization mode. -
- The server mode is tuned for decent performance of - long-running server applications. This mode is represented in two - variants to play with: the dynamic server mode using the - dynamic edge profile and the static server mode using the - profile based on heuristics. -

-

- You can define the profile to use on the command line. For example, to - set JIT to use the server dynamic mode, specify the following option: -

-

--Xem:server
-

-

- This section defines all optimizations that are currently available in - the Jitrino.OPT compiler. Related optimizations are gathered in - groups, as follows: -

-

- Scope Enhancement Passes -

-

- The high-level optimization begins with a set of transformations - to enhance the scope of further optimizations, as follows: -

-

- Guarded devirtualization - (devirt) of virtual method calls reduces their - run-time cost and enables the compiler to inline their - targets.
- Provided exact type information, this optimization can - convert a virtual call into a more efficient direct call. - When no type information is available, the most probable - target of the virtual method can be predicted, and the - optimization devirtualizes the call by guarding it with a - cheap run-time class test to verify that the predicted method - is in fact the target. -
- Inlining (inline) removes the - overhead of a direct call and builds the code of the called - method into the code of the caller in place of its call site. - Inlining is an iterative process involving other - optimizations. Inlining goes as follows: -
- - The inliner selects candidates for inlining in the - following sequence: -
  - - Examines each direct call site in the IR form, - including those exposed by guarded - devirtualization. -
  - - Heuristically estimates the potential benefit of - inlining. -
  - - Checks whether the benefit exceeds a certain - threshold, and, if it does, registers the call in - a priority queue. -
  -
- - The inliner selects the top candidate, if any, for - inlining. -
- - The translator generates an intermediate representation - for the method selected for inlining. -
- - The optimizer runs over HIR of the method using the - inliner pipeline. -
- - - The inliner finds further inline candidates, if any, in - the analyzed representation and replicates it in the - representation of the caller. -
- - The inliner selects a new inline candidate from the - queue and repeats the cycle.
  - The inliner stops its work when the queue is empty or - after code IR reaches a certain size limit. -
-
- The example below illustrates the inlining algorithm. -
-
```
-Inline(HIR_of_compiled_method) {
-    current_bytecode_size = HIR_of_compiled_method.get_method().bytecode_size()
-    find_inline_candidates(HIR_of_compiled_method)
-    while (true) {
-        callee = NULL
-        while (!inline_candidates.empty()) {
-            callee = inline_candidates.pop()   
-            callee_bytecode_size = callee.bytecode_size()
-            if ((current_bytecode_size + callee_bytecode_size) < SIZE_THRESHOLD) {
-                current_bytecode_size += callee_bytecode_size
-                break;
-            }
-        }
-        if (callee = NULL) {
-            break;
-        }
-        HIR_of_callee = Translator.translate(callee)
-        Optimizer.optimize(HIR_of_callee, inliner_pipeline)
-        find_inline_candidates(HIR_of_callee)
-        HIR_of_compiled_method.integrate(HIR_of_callee)
-    }
-}
-
-find_inline_candidates(method_HIR) {
-    foreach direct_call in method_HIR {
-        inline_benefit = compute_inline_benefit(direct_call)
-        if (inline_benefit > BENEFIT_THRESHOLD) {
-            inline_candidates.push(direct_call)
-        }
-    }
-}
-
```
-
- Lowering (lower) performs basic - instruction-level transformations to replace common helper - calls with the corresponding HIR code. A helper call - generally is performance-expensive, so that inlining the - operation performed by a helper method can improve - performance. This is especially true for operations that are - proved to be redundant afterwards. -

-

- Redundancy Elimination - Passes -

-

- This set of optimizations aims at eliminating redundant and - partially redundant operations. If JIT can prove that some - operations are redundant and have no side effects, they might be - removed from the code. This way, time for execution of the - redundant operations is saved and the resulting code executes - faster. This optimization group consists of the following - passes: -

-

- Memory optimization (memopt) - reduces the number of operations with memory by removing - redundant loading and storing instructions.
- Firstly, memopt works on the SSA form to - combine all locations of an object into one alias. After - that, the optimization updates use-def dependencies with the - alias instead of locations. According to these new - dependencies, memopt deletes redundant stores. - Finally, it performs scoped hash-value numbering on the - resulting control flow graph to eliminate redundant load - operations. -
- Lazy exceptions optimization - (lazyexc) eliminates redundant creation of - exception objects. In cases when an exception object is not - used in the exception handler, time spent on creating the - exception object and creating and recording the stack trace - in the exception object is wasted. If the constructor of the - exception object has no side effects and the exception object - is not used before it is thrown, then the creation of the - exception object is delayed until the exception object is - really used. -
- Loop-oriented optimizations are the following: -
- - Loop peeling moves one or more - iterations to the loop header to reduce the looping - overhead for a small loop count and to enable - optimizations in peeled iterations. -
- - Load invariant hoisting moves - operations that are invariant across loop iterations - outside the loop. -
- - Loop unrolling expands the loop body - by combining several iterations into one to reduce the - loop overhead and to expand the scope for optimizations - in the loop body. -
-
- Array-bounds check elimination - (abcd) analyzes method code and removes - redundant checks of array bounds. Normally, these checks - identify situations when a program tries to access an element - beyond the array bounds, and throw - ArrayIndexOutOfBoundsException. The JIT compiler - inserts such checks before every access to an array element - and some of these checks are redundant. [5]. -
- Global code motion (gcm) moves - computational instructions between basic blocks. The goal is - to move each movable instruction to the basic block with - minimal probability of execution. Probabilities are provided - by a profile based on static heuristics or on run-time - execution. To preserve semantics, only instructions without - side effects are considered movable. Instructions can be - moved up and down the dominator tree. -
- Lowering (lower) performs basic - instruction-level transformations to replace common helper - calls with the corresponding HIR code. A helper call - generally is performance-expensive, so that inlining the - operation performed by a helper method can improve - performance. This is especially true for operations that are - proved to be redundant afterwards. -

-

-

- HIR Simplification Passes -

-

- HIR simplification passes are a set of fast optimizations that - the Jitrino optimizer performs several times over the - intermediate representation to reduce its size and complexity. - Simplification passes improve code quality and efficiency of - more expensive optimizations. HIR simplifications are often - grouped in a series of simplification passes to be performed at - various points in the optimization path. -

-

- Unreachable code elimination - (uce) detects and removes unreachable code by - traversing the control flow graph. -
- Dead code elimination detects and removes - dead code by using a sparse liveness traversal over use-def - links of the SSA form [3]. -
- Simplification (simplify) - includes the following: -
- - Simplification of arithmetic expressions -
- - Copy and constant propagation -
- - Constant folding -
- - Subexpression re-association -
- - Simplification of trivial branches and calls [3] -
-
- Hash value numbering (hvn) - eliminates common sub-expressions [4]. This pass uses an in-order - depth-first traversal of the dominator tree instead of the - more expensive iterative data flow analysis. High-level value - numbering effectively eliminates redundant address - computation and check instructions. For example, - chkzero(), chknull(), and - chkcast() HIR instructions are redundant if - guarded by explicit conditional branches. -

-

- Static profile estimator (statprof) -

-

- Many optimizations can use the edge profile information for - greater efficiency. When the execution manager is configured to - use a dynamic profiling mode, the profile is gathered by the - JIT. But even in static mode, when a dynamic profile is not - available, Jitrino.OPT can use the statprof - optimization pass to update HIR with a profile based on - heuristics. In the dynamic profiling mode, some optimizations - may break profile information by changing the CFG structure. In - this case, statprof can be used to fix the profile - information and keep it consistent. -

-

-

- Back to Top -

-

- 3.2.3 Code Selection -

-

- After the optimization passes, HIR is translated to LIR. This code - selection (CS) is based on the HIR hierarchical structure of the - compiled method, as shown in Figure 2. -

-

- Code Selector work flow -

-

- Figure 2. Code Selector Framework -

-

- Where: -

-

- White boxes indicate parts of the code selector. Nesting of boxes - reflects the hierarchy of elements, see details below. -
- Yellows boxes with dashed borders indicate IR entities analyzed or - created by the corresponding code selector parts. -
- Arrows indicate IR element conversion via callback interface calls. -

-

- For the method, the set of operands of multiple definitions, the - control flow graph, and the set of CFG basic block nodes, the code - selector framework defines the following: -

-

- An abstract class representing a code selector of the HIR element, - and the default implementation of the abstract class on the - optimizer side. This part of the implementation is responsible for - HIR traversal. -
- A callback interface to transform direct sub-elements of the - analyzed entity from HIR to LIR. The callback is implemented on the - code generator side. This part is responsible for LIR creation -

-

- Thus, the CS framework establishes a well-defined boundary between the - optimizer and a pluggable code generator. The code selector framework - also enables a structural approach to IR conversion, which CG can - override at several levels. -

-

- Figure 3 shows the process of code selection, with loops highlighted - using the yellow color. -

-

- Sequence of code selection with objects and method calls shown -

-

- Figure 3. The Code Selection Sequence Diagram -

-

- Figure 3 illustrates specifics of the conversion process, as follows: -

-

- All instances of the code selector (CS) classes responsible for HIR - traversal are created on the optimizer side. -
- All instances of the callback interface implementations responsible - for building LIR are created on the CG side. -
- The top-down IR conversion is performed as follows: -
- - Variable CS traverses all HIR operands with multiple - definitions and calls the Variable CS callback for each such - operand to create the corresponding LIR operand. -
- - CFG CS traverses all HIR nodes and edges and calls methods - from the CFG CS callback to create the corresponding LIR - nodes and edges. -
- - For each basic block node, CFG CS creates an instance of - Basic Block CS, which then calls methods of the Basic Block - CS callback to translate HIR instructions from the basic - block to LIR. Basic Block CS is also known as the Instruction - Selector. -
-

-

- Back to Top -

-

- 3.2.4 Code - generation -

-

- The code generation process is specific to the pluggable code - generator implementing it. This section briefly describes the current - implementation of Jitrino IA-32/Intel® 64 code generator, as well - as measures taken to ensure that it is - thread-safe. -

-

- To generate code for a method, the code generator performs a number of - steps that are roughly divided into the following stages: -

-

- LIR Creation -

-

- At this stage, the code generator creates the LIR corresponding - to the input HIR in its implementation of the code selector - callback interfaces. The resulting LIR is quite compact and - possesses the following properties: -

-

- Most 2-operand instructions are generated in the extended - 3-operand form with a separate destination operand. -
- Each call site is represented as a single instruction without - explicit stack creation for callee arguments. -
- 64-bit integer arithmetic is represented by - pseudo-instructions (macros) -
- Address arithmetic is mostly explicit without usage of the - complex address forms -
- Most operand copy instructions are represented by - pseudo-instructions which do not impose any constraints on - its operands -

-

- LIR Transformations -

-

- At this stage, the code generator performs a number of - transformations and optimizations over LIR, as follows: -

-

- Inserts yield points at back branches of certain kinds of - loops to enable safe thread suspension. -
- Performs the first pass of GC safe-point analysis, which - transforms code to ensure correct GC map creation at the end - of the code generation process regardless of in-process - transformations. -
- Folds address arithmetic into complex address forms as - needed. -
- Expands pseudo-instructions for 64-bit integer arithmetic to - real native instruction sequences with some optimizations - applied. -
- Translates LIR instructions from the extended 3-address form - to the native 2-address form. -
- Analyses instruction and calling convention constraints. - Based on analysis results, the code generator splits operands - so that each operand satisfies the constraints of the - instructions where it is used. -
- Performs global register allocation to assign most frequently - used operands to general-purpose or XMM registers as needed. -
- Performs local register allocation and spill-code generation - on each basic block taking into account instruction - constraints. This pass ensures that all operands are assigned - to physical locations, in a register or on the stack. This - pass can produce correct code with no prior global register - allocation. -
- Linearizes CFG basic blocks according to profile information, - if any. -
- Expands copy pseudo-instructions to real native instruction - sequences. Copies of stack operands with non-overlapping live - ranges are coalesced. -
- Goes over the stack layout to assign offsets to stack - operands and to create the stack layout description. -

-

- The actual code generation process can also include different - optimization passes, such as constant and copy propagation, dead - code elimination, and redundant comparison elimination. - Optimizations are enabled via EM configuration files and the - command-line interface. -

-

- Code and Data Emission -

-

- At this stage, the code generator does the necessary - preparations and translates LIR into machine code, as follows: -

-

- Generates all required binary chunks from LIR and links the - generated code to VM for further run-time support. - Specifically, the code generator does the following: -
1. - Creates a constant data area with switch tables, - floating point constants, and other data that might be - needed for CG debugging features. -
2. - Links LIR to VM data structures and the constant data - area. -
3. - Translates LIR into machine code. -
4. - Registers direct calls to other managed code to enable - patching in case the target of a direct call is - recompiled later. -
5. - Registers try blocks and corresponding exception - handlers with VM. -
6. - Registers information about inlined methods with VM. -
-
- Updates the stack layout description with additional stack - information, such as stack depth bound to offsets of - CALL instructions. -
- Creates the GC map to describe root set locations for each GC - safe point. -
- Note -
-
- Only call sites are considered GC safe points in the - current implementation -
-
- Writes the stack layout description, the GC map, and the - bytecode map into the memory chunk associated with the - compiled method. These data are further used at run time for - the following: -
- - The stack layout description is used in stack unwinding - for exception handling, GC root set enumeration, and - other stack iteration operations. -
- - The GC map is used for root set enumeration by a - precise garbage collector. -
- - The bytecode map is used for mapping between native - code and Java* bytecode. -
-

-

- Global Lock -

-

- Because memory allocation routines are not thread-safe in the - current VM implementation, Jitrino sets a global lock for the - code generation stage to ensure correct allocation of memory for - compiled method data. The global lock must be taken into account - when working in a multi-threaded environment, for example, when - compilation of a method starts simultaneously in several - threads. The global lock is shared between Jitrino.JET and - Jitrino.OPT and ensures that only a single thread tries to - allocate memory for a method at once. The lock is taken in the - lock_method Action object and released in the - unlock_method Action object. -

-

- The lock_method action also checks whether a code - block is already allocated by the current JIT instance for the - method being compiled. If the code block is already allocated, - the method has already been compiled in another thread. In this - case, the lock_method action does not place the - lock, but stops compilation with the - COMPILATION_FINISHED status. The action - unlock_method releases the lock taken by the - lock_method action. -

-

- The global lock imposes the following requirements: -

-

- No Action object in the code generator can stop - compilation with the COMPILATION_FINISHED or - COMPILATION_FAILED condition. Otherwise, the - lock remains set and blocks method compilation in other - threads. -
- Resources with the live time equal to the method’s life - time must be allocated only in the code generator, and not in - the optimizer. -
- Code generation actions must not invoke VM methods that might - lead to execution of Java code (for example, - resolve_static_method); otherwise, the action - might lead to a deadlock. -

-

-

- Back to Top -

-

- 4. JITRINO.JET -

-

- 4.1 Architecture -

-

- The Jitrino.JET baseline compiler is the Jitrino subcomponent used for - translating Java* bytecode into native code with - practically no optimizations. The compiler emulates operations of - stack-based machine using a combination of the native stack and - registers. -

-

- 4.1.1 Run-time Support -

-

- During the code generation phase, the state of the method's operand - stack is mimic. This state helps to calculate the GC map, which is - used later at run time to support GC operation. -

-

- The GC map shows whether the local variables or the stack slots - contain an object. The GC map for local variables is updated on each - defining operation with a local slot, as follows: -

-

- If an object is stored, the appropriate bit in the GC map is set - (code is generated to set the bit). -
- If a number is stored, the appropriate bit gets cleared. -

-

- The GC map for the stack is updated only at GC points, that is, before - an instruction that may lead to a GC event, for example, a VM helper - call. The stack depth and the stack state calculated during method - compilation get saved before invocation: code is generated to save the - state. The state is saved into the special fields that are - pre-allocated on the native stack of the method. These fields include - GC information, namely the depth of operand stack, the stack GC map, - and the locals GC map. -

-

- Additionally, Jitrino.JET prepares and stores a specific structure, - the method info block, for each method during compilation. - This structure is later used to support run-time operations, such as - stack unwinding and mapping between bytecode and native code. -

-

- Back to Top -

-

- 4.2 Processes -

-

- 4.2.1 - Baseline Compilation -

-

- Baseline compilation is the process of compiling code with minimal - optimization. The Jitrino.JET subcomponent performs - this operation as described below. -

-

- Jitrino.JET performs two passes over bytecode, as shown in Figure 4. - The compiler establishes basic block boundaries during the first pass, - and generates native code during the second. -

-

- Example of two-pass compilation process -

-

- Figure 4. Baseline Compilation Path -

-

- Subsequent sections provide a description of these passes. -

-

- Pass 1 -

-

- During the first pass over bytecode of a method, the compiler - finds basic block boundaries and counts references for these - blocks. -

-

- Note -

-

- The reference count is the number of ways for reaching - a basic block (BB). -

-

- To find basic blocks boundaries, Jitrino.JET does a linear scan - over the bytecode and analyses instructions by using the - following rules: -

-

- Instructions athrow, return, - goto, conditional branches, - switches, ret, and jsr - end a basic block. -
- Basic block leader instructions immediately follow - the instructions ending a basic block or serve as targets for - branches. Exception handler entries are also among the basic - block leaders. -

-

- During the first pass, the compiler also finds the reference - count for each block. -

-

- Example -

-

- Figure 4 illustrates an example with reference counts. The - reference count ref_count for the second basic - block (BB2) is equal to 1 because this block can - only be reached from the first basic block (BB1). The other - reference count is equal to 2, because the third - basic block can be reached as a branch target from BB1 or a - fall-through from BB2. -

-

- Example of reference counters reached from different basic blocks. -

-

- Figure 5. Reference Count for Basic Blocks -

-

- Jitrino.JET uses the reference count during code generation to - reduce the number of memory transfers. -

-

- Pass 2 -

-

- During the second pass, Jitrino.JET performs the code - generation, as follows: -

-

- Walks over the basic blocks found at Pass 1 in the - depth-first search order -
- Generates code for each bytecode instruction and mimics the - Java* operand stack -
- Matches the native code layout and the bytecode layout -
- Updates relative addressing instructions, such as - CALL and JMP instructions. -

-

-

- For details on the implementation of baseline compilation, generate - reference documentation from the source code by using Doxygen. -

-

- Back to Top -

-

- 5. Utilities -

-

- The JIT compiler relies on the following utilities: -

-

- The memory manager minimizes the - number of calls to system heap allocations and automatically frees - all allocated memory in the destructor. -
- A set of Standard Template Library (STL) containers use the memory - manager as their allocator class. -
- Timers gather compilation and execution - statistics. -
- The logging system support diagnostics - inside the compiler. -
- The control flow graph represents the flow of a - method and give basis for IR forms. -
Internal Profiler of the code generator instruments code so that per-method counters - of executed instructions can be dumped. -

-

- Note -

-

- The JIT compiler utilities are similar to, but not identical with the - VM utilities. For example, the JIT compiler and the VM core use - different loggers. -

-

- Back to Top -

-

- 5.1 Memory Manager -

-

- In the Jitrino.OPT compiler, memory allocation is done using custom - memory manager routines. This mechanism ensures that all memory - allocated during a compilation process is freed after the compilation - is finished. In addition, the memory manager decreases the number of - system calls by using the fast thread-local memory allocation - algorithm. Memory manager code and operators for overloaded memory - allocation are in .h and .cpp files in the - jitrino/src/shared/ directory. -

-

- To start using the memory manager, a JIT compiler developer must - create an instance of it providing the initial heap size and the name - to be used for logging. -

-

- The memory manager allocates memory from the operating system in large - chunks called arenas. The minimal size of an arena used in - MemoryManager is 4096 bytes. When the JIT compiler - requests to allocate memory for an object, the memory is taken from - the current arena with no system calls. When the current arena does - not have enough free space, the memory manager allocates another - arena. -

-

- Here is a typical pattern for using the MemoryManager - class: -

-

-void optABC() {
-    //the temporary memory manager used for optABC optimization data
-    MemoryManager tmpMM(10000, "mm::optABC");
-
-    StlVector<int> myData1(tmpMM, 1000);
-    int* myData2 = new (tmpMM) int[1000];
-    //JIT compiler code follows
-}
-

-

- The memory allocated with the memory manager is de-allocated in its - destructor and no destructors are called for objects allocated with - the memory manager. This feature of the memory manager enforces the - following rules upon JIT compiler code: -

-

- Never allocate MemoryManager using another memory - manager. Otherwise, the memory of MemoryManager is - never freed. -
- Mix objects allocated with different memory managers carefully. - Lifetime of such objects can be different. -
- Destructors of objects allocated with MemoryManager - are never called. Leave the destructors empty. -
- To avoid out-of-memory errors, remember that the memory allocated - with MemoryManager is de-allocated only when - MemoryManager is destroyed. -

-

- Jitrino.OPT has two dedicated memory managers: -

-

- The global memory manager created when Jitrino is - initialized. This memory manager is used with objects having the - same lifetime as the JIT compiler. See - jitrino/src/main/Jitrino.cpp file and - global_mm static field for details. -
- The compilation time memory manager created every time the - compilation process starts. This memory manager allocates objects - with the lifetime equal to compilation time, such as instructions, - nodes, edges and other structures related to the compilation - context. -

-

- Using MemoryManager, you might not get system - notifications on memory corruption. -

-

- Example -

-

- Memory corruption can happen when a value is stored to the array by - the index that is out of the array's range: -

-

-    MemoryManager tmpMM(10000, "myMM");
-    int* myData2 = new (tmpMM) int[10];
-    myData[10] = 1;
-

-

- This code is executed successfully because the default memory chunk - allocated by the memory manager is greater than the array size. -

-

- To enable the checking of memory corruption errors, define the - JIT_MEM_CHECK macro in the MemoryManager.cpp - file. After this macro is defined, the memory manager fills all the - arena's space with the predefined value and adds the padding space - between objects. Every time an object is allocated, the memory manager checks these predefined values in - the arena. If a write operation has been performed in the - restricted area, the memory manager reports an error. -

-

- Back to Top -

-

- 5.2 Counters and Timers -

-

- Jitrino maintains counters to collect statistics. A counter - can be used in any Jitrino action to count a particular event in all - pipelines and during the whole VM session. Each counter has a name to - distinguish it from other counters. -

-

- To sum up execution times of a Jitrino action, Jitrino also provides - timers, a specialized form of counters. To activate - counters and time measurement, use the following command syntax: -

-

--XDjit.<JIT>.arg.time=on
-

-

- Note -

-

- This option is off by default. -

-

- The execution time of all instances of each action is measured - independently and summed up at VM shutdown. Resulting data on action - execution times are printed into a table and sorted by the action - name. -

-

- Note -

-

- Currently, to print the action execution times and counter values - tables, you need to specify the following VM command-line option: -

-

-–XcleanupOnExit
-

-

- Back to Top -

-

- 5.3 Logging System -

-

- The Jitrino logging system does the following: -

-

- Provides diagnostic output of important Jitrino internal data - structures -
- Supports user-defined diagnostic -
- Provides a flexible control over a diagnostic process via - command-line options. -

-

- The logging system is an integral part of Jitrino PMF. Logging - consists of two interfaces: -

-

- The program interface that provides stream output methods -
- The command-line interface that provides filtration of output - logging streams -

-

- The logging system is based on streams. Each stream has a - name used to address it in a program and command-line options. Jitrino - provides several frequently used streams with predefined names. These - streams produce specific output when enabled, as follows: -

- - - - - - - - - - -

- Name -	- Output -
- `info` -	- The protocol of compilation: JIT and pipeline names, the method - name and number, and so on -
- `rt` [... 3435 lines stripped ...]