commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Phil Steitz <>
Subject Re: [math] OpenGamma library
Date Sat, 15 Oct 2011 15:49:44 GMT
On 10/15/11 8:46 AM, Phil Steitz wrote:
> On 10/15/11 5:41 AM, Gilles Sadowski wrote:
>> Hi.
>>> first of all, I was the author of this very usefull statement on
>>> factories... Very constructive indeed.
>> Liking something or not is an impression that could well be justified
>> afterwards. It also pushes to look for arguments that ascertain the
>> feeling. ;-) 
>>>> However it also shows that the improvement is only ~13% instead of the ~30%
>>>> reported by the benchmark in the paper...
>>> could it be that their "naive" implementation as a 2D array is very
>>> naive indeed? I notice in the listings provided in the paper that they
>>> constantly refer to a[i][j]. I think the strength of having a row
>>> representation is to define a temporary variable ai = a[i], and access
>>> to a[i][j] as ai[j]. That's what is done in CM anyway, maybe that
>>> explains why the gain is not so big in the end.
>> You are right; the "naïve" code repeatedly access a[i][j].
>> But this alone doesn't make up for the difference (cf. table below).
>> operate (calls per timed block: 10000, timed blocks: 100, time unit: ms)
>>            name      time/call      std error total time      ratio difference
>>    Commons Math 1.19770542e-01 2.85011660e-04 1.1977e+05 1.0000e+00 0.00000000e+00
>> OpenGamma naive 1.23798907e-01 4.01495625e-04 1.2380e+05 1.0336e+00 4.02836495e+03
>>    OpenGamma 1D 1.04352827e-01 2.08970600e-04 1.0435e+05 8.7127e-01 -1.54177153e+04
>>    OpenGamma 2D 1.12666770e-01 3.50012912e-04 1.1267e+05 9.4069e-01 -7.10377213e+03
>>>> I don't think that CM development should be focused on performance
>>>> improvements that are so sensitive to the actual hardware (if it's indeed
>>>> the varying amount of CPU cache that is responsible for this discrepancy).
>>> That would apparently require fine tuning indeed, just like BLAS
>>> itself, which has -I believe- specific implementations for specific
>>> architectures. So it's a bit going against the philosophy of Java. I
>>> wonder how a JNI interface to BLAS would perform ? That would leave
>>> the architecture specific issues out of the Java code (which could
>>> even provide a basic implementation of basic linear algebra operations
>>> if people do not want to use native code.
>> The author of the paper proposes to indeed clone the BLAS tuning
>> methodology.
>> However, I don't think that this should be a priority for CM (as a
>> general-purpose math toolbox).
>>>> If there are (human) resources inclined to rewrite CM algorithms in order
>>>> boost performance, I'd suggest to also explore the multi-threading route,
>>>> I feel that the type of optimizations described in this paper are more in
>>>> realm of the JVM itself.
>>> I would be very interested, but know nothing on multi-threading. I
>>> will need to explore multi-threading for work anyway, so maybe in the
>>> future?
> Any references to specific optimizations or algorithm improvements here?
>> Yes, 3.1, 3.2, ... , 4.0, ... whatever.
>>> In the meantime, may I bring to you attention the JTransforms
>>> library? (
>>> It's a multi-threaded library for various FFT calculations. I've used
>>> it a lot, and have been involved in the correction of some bugs. I've
>>> never benchmarked it against CM, but the site claims (if my memory
>>> does not fail me) greater performance.
>> Yes, I did not perform benchmarks; however, Luc already pointed out that he
>> had not pay particular attention to the speed efficiency of the code in CM.
> I don't think Luc meant to make a broad general statement there. 
> IIRC, he was talking about one matrix representation class.  Lets
> focus on specific problems and solutions.

Pls ignore.  I misread the comment above as applying to the original
subject, which was the linear package.  I agree that the FFT impl
needs work.

> Phil
>> Also, there are other problems, cf. issue
>>> Also it can handle
>>> non-power-of-two array dimensions. Plus, the author seems to have no
>>> longer time to spend on this library, and may be willing to share it
>>> with CM. That would be a first step in the multi-threading realm.
>> Unfortunately, no; he doesn't want to donate his code.
>>> Beware, though; the basic code is a direct translation of C code, and
>>> is sometimes difficult to read (thousands of lines, with loads of
>>> branching: code coverage analysis was simply a nightmare!).
>> So, the above information is only half bad news! ;-)
>> Best,
>> Gilles
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail:
>> For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message