incubator-hama-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Difference between sparse* and dense*
Date Wed, 18 Mar 2009 13:21:31 GMT
Oh, good point.

Hbase seems good fit for huge sparse matrcies.

- Non-zero value
- Index for row and column

However, It's too good for dense matrix. IMO, We can't store the huge
dense matrix to Hbase. When I store the 5000 * 5000 double matrix with
row/column/time index to Hbase, 15~16 GB was used for each nodes.
(replica = 3) So, I made a two implement.  We should survey about data
structures.

And, There is also a difference of algorithms/benefits between Dense
and Sparse.

- The blocking algorithm only work for Dense Matrix, And stores all.
- Sparse Matrix stores only non-zero value (storage efficient) but, If
sparsity is low, manipulations will have some overhead by irregular
access through network.

I've start the work for documentation --
http://wiki.apache.org/hama/Architecture -- Please also review this.

On Wed, Mar 18, 2009 at 8:24 PM, Samuel Guo <guosijie@gmail.com> wrote:
> Hi all,
>
> It seems that DenseVector and SparseVector both use *MapWritable* as the
> container of vector data. And the methods' implementations of DenseVector &
> SparseVector are similarly. so why we need two copies of the code?
>
> There are same issues in DenseMatrix and SparseMatrix.
>
> Regards,
> Samuel
>



-- 
Best Regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Mime
View raw message