[ https://issues.apache.org/jira/browse/HARMONY-2171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mikhail Fursov updated HARMONY-2171:
------------------------------------
Priority: Minor (was: Major)
Estimated Complexity: Guru (was: Unknown)
Lowering priority.
This method will not benefit from loop unrolling optimization. It requires prefetch optimization.
> [drlvm][jit] Loop Unrolling for private method java.util.TreeMap.successor().
> -----------------------------------------------------------------------------
>
> Key: HARMONY-2171
> URL: https://issues.apache.org/jira/browse/HARMONY-2171
> Project: Harmony
> Issue Type: Improvement
> Components: DRLVM
> Environment: IA-32
> Reporter: Sergey Kuksenko
> Priority: Minor
>
> Method java.util.TreeMap::successor is hot method for iterations over TreeMap (and some
others operations).
> Generated code for the method may be improved.
> Java code of the method:
> static TreeMapEntry successor(TreeMapEntry x) {
> if (x.right != null)
> return minimum(x.right);
> TreeMapEntry y = x.parent;
> while (y != null && x == y.right) {
> x = y;
> y = y.parent;
> }
> return y;
> }
> Also method "minimum" is usualy inlined. minimum's code:
> static TreeMapEntry minimum(TreeMapEntry x) {
> while (x.left != null)
> x = x.left;
> return x;
> }
> The generated asm code the follwoing:
> line address code
> 1 B7EE60: push ebx
> 2 B7EE61: push ebp
> 3 B7EE62: mov eax,dword ptr [esp+0Ch]
> 4 B7EE66: mov ebp,eax
> 5 B7EE68: mov ebx,dword ptr [eax+14h]
> 6 B7EE6B: test ebx,ebx
> 7 B7EE6D: je 00B7EEA8
> 8 B7EE73: mov ebp,dword ptr fs:[00000014h]
> 9 B7EE7A: mov eax,ebx
> 10 B7EE7C: mov ebx,dword ptr [eax+18h]
> 11 B7EE7F: test ebx,ebx
> 12 B7EE81: je 00B7EEF2
> 13 B7EE87: mov ebp,dword ptr fs:[00000014h]
> 14 B7EE8E: cmp dword ptr [ebp+00000270h],0
> 15 B7EE98: je 00B7EE7A
> 16 B7EE9E: call 00A96850
> 17 B7EEA3: jmp 00B7EE7A
> 18 B7EEA8: mov ebx,dword ptr [eax+1Ch]
> 19 B7EEAB: mov eax,dword ptr fs:[00000014h]
> 20 B7EEB2: mov eax,ebx
> 21 B7EEB4: test eax,eax
> 22 B7EEB6: je 00B7EEED
> 23 B7EEBC: cmp ebp,dword ptr [eax+14h]
> 24 B7EEBF: jne 00B7EEED
> 25 B7EEC5: mov ebp,eax
> 26 B7EEC7: mov eax,dword ptr [eax+1Ch]
> 27 B7EECA: mov ebx,eax
> 28 B7EECC: mov eax,dword ptr fs:[00000014h]
> 29 B7EED3: cmp dword ptr [eax+00000270h],0
> 30 B7EEDD: je 00B7EEB2
> 31 B7EEE3: call 00A96850
> 32 B7EEE8: jmp 00B7EEB2
> 33 B7EEED: pop ebp
> 34 B7EEEE: pop ebx
> 35 B7EEEF: ret 4
> 36 B7EEF2: pop ebp
> 37 B7EEF3: pop ebx
> 38 B7EEF4: ret 4
> -------------------
> Unrolling of small cycles by factor 4 or 8 will decrease BBP impact.
> Unrolling this cycle by factor 2 allows making more effective register usage.
> The original loop in lines 9-17:
> loop: mov eax,ebx
> mov ebx,dword ptr [eax+18h]
> test ebx,ebx
> je exit
> mov ebp,dword ptr fs:[00000014h]
> cmp dword ptr [ebp+00000270h],0
> je loop
> call suspend
> jmp loop
> ....
> exit: ; result already in eax.
> pop ebp
> pop ebx
> ret 4
> At this cycle variable x is located at eax register. x.left is loaded into ebx
> register and for each iteration we should move ebx to eax. If this cycle will
> be unroll at least by factor 2 we may change register assignment. For the first
> iteration store x in ebx and load x.left to eax and for second iteration store
> x in eax and load x.left to ebx. After such transformation instruction mov eax,
> ebx at each cycle became useless.
> Let's move loading TLS address out of cycle and unroll this cycle by 4.
> mov ebp,dword ptr fs:[00000014h]
> ; at this point ebx contains x
> loop:
> mov eax,dword ptr [ebx+18h] ; eax = x.left
> test eax,eax
> je exit1
> mov ebx,dword ptr [eax+18h] ; ebx = x.left.left
> test ebx,ebx
> je exit
> mov eax,dword ptr [ebx+18h] ; eax = x.left.left.left
> test eax,eax
> je exit1
> mov ebx,dword ptr [eax+18h]
> test ebx,ebx
> je exit
> cmp dword ptr [ebp+00000270h],0
> je loop
> call suspend
> jmp loop
> ....
> exit1:
> mov eax, ebx ; here we should move result from ebx to eax.
> exit: ; here the result already eax.
> pop ebp
> pop ebx
> ret 4
> After such unrolling will have 3.5 instructions per iteration instead of 7
> instructions per iteration.
> Such unrolling will be very useful for cycles like:
> while (x!=null) {
> ...
> x = x.next;
> }
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
|