commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ole Ersoy <ole.er...@gmail.com>
Subject Re: [math] RealMatrixFormat.parse()
Date Fri, 01 Jan 2016 00:37:32 GMT


On 12/31/2015 05:42 PM, Gilles wrote:
> On Thu, 31 Dec 2015 12:54:00 -0600, Ole Ersoy wrote:
>> On 12/31/2015 11:10 AM, Gilles wrote:
>>> On Wed, 30 Dec 2015 21:33:56 -0600, Ole Ersoy wrote:
>>>> Hi,
>>>>
>>>> In RealMatrixFormat.parse() MatrixUtils makes the decision on what
>>>> type of RealMatrix instance to return.
>>>
>>> Ideally, this is correct as the actual type is an "implementation detail".
>>>> Flexibility is gained if it
>>>> just returns double[][] letting the caller decide what type of
>>>> RealMatrix instance to create.
>>>
>>> That could become a problem e.g. for sparse matrices where the persistent
>>> format and the instance type could be optimized for space, but a "double[][]"
>>> cannot be.
>> RealMatrixFormat.parse() first creates a double[][] and then it drops
>> it into the Matrix wrapper it thinks is best, per MatrixUtils. By
>> leaving out the last step the caller can either use MatrixUtils (Or
>> hopefully MatrixFactory) to perform the next step. Or maybe there is
>> no next step.  Perhaps just having a double[][] is fine.
>
> My opinion is that this code should be in a separate IO module.
> where the external format can be made more flexible and more
> correct (such as not doing unnecessary allocation).
Totally with you on that.  Ideally something along the lines of MatrixPersist and MatrixParse
classes that support localized formatting.  Right now it's all bundled up into RealMatrixFormat...probably
due to time constraints.  I'll look at modularizing that part later.  Right I'm breaking up
MatrixUtils into MatrixFactory and LinearExceptionFactory, and then once the dust settles
I can look at the IO piece in more detail.
>
>>>> It's also better for modularity, as is
>>>> reduces RealMatrixFormat imports (The MatrixUtils supports Field
>>>> matrices as well, and I'm attempting to separate real and field
>>>> matrices into two difference modules).
>>>
>>> For modularity, IO should not be in the same module as the core
>>> algorithms.
>> I agree in general.  I'm sticking all the 'Real' (Excluding Field)
>> classes in one module (Vector and Matrix).  AbstractRealMatrix uses
>> RealMatrixFormat, so it's tightly coupled ATM and it seems like it
>> belongs with the real Vector and Matrix classes so...
>
> Given the major refactoring which you are attempting, why not drop
> everything that does not belong?
Good point.  I'll just strip out the formatting, etc. from AbstractRealMatrix and reintroduce
it in the IO module.

>
>>>
>>>> Also just curious if Array2DRowRealMatrix is worth keeping?  It seems
>>>> like the performance of BlockRealMatrix might be just as good or
>>>> better regardless of matrix size ... although my testing is limited.
>>>
>>> I recall having performed a benchmark years ago and IIRC, the
>>> "BlockRealMatrix" started to be more only for very large matrix size
>>> (although I don't remember which).
>> That was what I was seeing as well.  Once matrix rows reach 100K - 10
>> million performance goes up between 2X and 5X, but I did not really
>> see any difference for (multiplication only) in performance for small
>> data sets.  So I'm assuming, like Luc indicated, that the
>> Array2DRowRealMatrix is only better when attempting to reuse the
>> underlying double[][] matrix a lot...
>
> As I recall, for "small" matrices, the "Block" version was significantly
> slower. Depends what we call "large" and "small"...
Hmm - That probably makes sense since Block has to create the block structure.  I'll have
a second look once I get a good profiling setup added to the module.

HAPPY NEW YEAR!!

Ole


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Mime
View raw message