Aleksey,
Thank you for thorough reply. I like and support your suggestion about
commenting the changes. Let me address the most mysterious thing I
still don't understand: adding a fast path for zeroes in the shift
function.
If this optimization helps increasing performance, some code sends
zeroes to the shift function more frequently than other numbers. Do
you know where your shift is called with zeroes as a typical argument?
What if we place the check in the specific caller instead of the
shift?
Thanks!
On Thu, May 22, 2008 at 4:07 PM, Aleksey Shipilev
<aleksey.shipilev@gmail.com> wrote:
> Hi, Alexei!
>
> Thanks for the review! Here is some light on your questions:
>
> On Thu, May 22, 2008 at 11:27 AM, Alexei Fedotov
> <alexei.fedotov@gmail.com> wrote:
>> 1. I have noticed that you have replaced shiftLeft(1) with
>> shiftLeftOneBit() in the following checks which appear . Is it
>> performance critical part or just cleaning?
> This is performancewise change. "sign == 0" guarantees that shifting
> is not needed.
>
>> // Checking if: remainder * 2 >= scaledDivisor
>> remainder.abs().shiftLeftOneBit().compareTo(scaledDivisor.abs())
>>
>> If this is critical, then can one imagine a way of checking that a
>> number is twice bigger than another number without copying and
>> shifting the number? For cases when one number has more binary digits
>> than another this should be easy.
> IMO, this way is also fast, though we can think up about something
> with peak efficiency. I see we will need structural changes to
> accommodate another design which would allow us to use other checking
> schemes.
>
>> 2. What is the meaning of 2 in the new function name unsignedMultAdd2?
>> Why not unsignedMultAdd4?
> That mean "unsigned multiplication and addition of 2 arguments", hence
> the "2" in signature. We can change it to another if this is
> confusing.
>
>> 2a. What is the purpose of the following fix? How a function call
>> (even for a magic function), could be quicker than multiplication
>> inlined by JIT? Does it mean our JIT poorly inlines multiplication?
>>  long res = (aDigits[0] & 0xFFFFFFFFL) * (factor);
>> + long res = unsignedMultAdd2(aDigits[0], factor, 0, 0);
> Well, the original purpose of this was code cleanup (see how original
> version look like, bloated with 0xF...F constants), so we came up with
> idea of extracting unsigned operations of (a*b), (a*b + c), (a*b + c +
> d) as general method. Then you can imagine two situations:
> 1. JIT compiler inlines the method and eliminated unneeded additions.
> 2. JIT compiler generates the magic for this method and substituting
> it with optimal native code (HARMONY5826).
>
> *sigh* I guess I need to write the javadoc for this method.
>
>> 3. You added a check against shifting zero values. Does adding this
>> check on a common path add to performance? If zeroes come to shift
>> function often enough to give a benefit, which caller spams so much
>> zeroes? Does it worth to patch the caller instead?
> Nope, the benefit of 0path there is much more than penalty on non0path.
>
>> 4. I have an abstract concern with no idea how to address it. An
>> integer array in memory is very similar to a long array (if endings
>> are correct). Is there any way to use 64bit operations for speed up?
> We thought about such the thing. This would require severe changes to
> fit into existing code. More details would be later :)
>
> Thanks,
> Aleksey.
>

With best regards,
Alexei
