commons-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ismael Juma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MATH-230) Implement Sparse Matrix Support
Date Fri, 12 Dec 2008 10:17:44 GMT

    [ https://issues.apache.org/jira/browse/MATH-230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12655953#action_12655953
] 

Ismael Juma commented on MATH-230:
----------------------------------

Yes, we still need an open addressed map if we want to have good space characteristics. A
comparison of how much space each entry takes for each case follows (note that I ignore the
things that take constant space independent of the number of entries like size and such):

For HashMap<Integer,Double>, assuming a 32-bit JVM (would be much worse for the 64-bit
one) and that the map has reached the maximum load factor (in bytes):

- Map.Entry object = 4 (key reference) + 4 (value reference) + 4 (next reference) + 4 (cached
int hash) + 8 (JVM mark and klass)[1] = 24 bytes
- Integer object = 16 bytes[1]
- Double object = 16 bytes (the reason why Integer and Double objects take the same space
is due to alignment padding in Integer)
- Space in Map.Entry[] = 4 bytes / 0.75 = 5.33 bytes
- Total = 41.33 bytes

For OpentIntToDoubleHashMap, assuming a 32-bit JVM (would be similar in 64-bit JVM since it
doesn't rely on pointers in the non constant variables) that the map has reached the maximum
load factor (in bytes):

- Space in byte[] = 1 byte / 0.50 (load factor) = 2 bytes
- Space in double[] = 8 bytes / 0.50 (load factor) = 16 bytes
- Space in int[] = 4 bytes / 0.50 (load factor) = 8 bytes
- Total = 26 bytes

So, HashMap takes more than 50% more space for a 32-bit JVM and quite a bit more for the 64-bit
one (since it's pointer heavy and alignment issues make it worse) and it's harder on the GC
because all those Double, Integer and Map.Entry objects must be collected. Furthermore, the
additional space doesn't even necessarily help with cache locality since Map.Entry only holds
references to the actual objects (unfortunately, until we get value types, a.k.a. structs,
we can't implement the ideal HashMap).

[1] http://blogs.sun.com/jrose/entry/fixnums_in_the_vm

> Implement Sparse Matrix Support
> -------------------------------
>
>                 Key: MATH-230
>                 URL: https://issues.apache.org/jira/browse/MATH-230
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 2.0
>         Environment: N/A
>            Reporter: Sujit Pal
>            Assignee: Luc Maisonobe
>            Priority: Minor
>             Fix For: 2.0
>
>         Attachments: math-230.diff, patch.txt, RealMatrixImplPerformanceTest.java, SparseRealMatrixImpl.java,
SparseRealMatrixImplTest.java
>
>
> I needed a way to deal with large sparse matrices using commons-math RealMatrix, so I
implemented it. The SparseRealMatrixImpl is a subclass of RealMatrixImpl, and the backing
data structure is a Map<Point,Double>, where Point is a struct like inner-class which
exposes two int parameters row and column. I had to make some changes to the existing components
to keep the code for SparseRealMatrixImpl clean. Here are the details.
> 1) RealMatrix.java:
>    - added a new method setEntry(int, int, double) to set data into a matrix
> 2) RealMatrixImpl.java:
>    - changed all internal calls to data[i][j] to getEntry(i,j).
>    - for some methods such as add(), subtract(), premultiply(), etc, there
>      was code that checked for ClassCastException and had two versions,
>      one for a generic RealMatrix and one for a RealMatrixImpl. This has
>      been changed to have only one that operates on a RealMatrix. The 
>      result is something like auto-type casting. So if:
>        RealMatrixImpl.add(RealMatrix) returns a RealMatrixImpl
>        SparseRealMatrixImpl.add(RealMatrix) returns a SparseRealMatrixImpl
> 3) SparseRealMatrixImpl added as a subclass of RealMatrixImpl.
> 4) LUDecompositionImpl changed to use a clone of the passed in RealMatrix
>    instead of its data[][] block, and now it uses clone.getEntry(row,col)
>    calls instead of data[row][col] calls.
> 5) LUDecompositionImpl returned RealMatrixImpl for getL(), getU(), getP()
>    and solve(). It now returns the same RealMatrix impl that is passed 
>    in through its constructor for these methods.
> 6) New test for SparseRealMatrixImpl, mimics the tests in RealMatrixImplTest,
> 7) New static method to create SparseRealMatrixImpl out of a double[][] in
>    MatrixUtils.createSparseRealMatrix().
>    but using SparseRealMatrixImpl.
> 8) Verified that all JUnit tests pass.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message