> extracting a block is only efficient if it is full width or height. For
> very sparse matrix operations, the savings due to reuse of intermediate
> results are completely dominated by the I/O cost so block decompositions are
> much less helpful.
Hmm, Yes. Thanks for your great comments. :)
> Of course, it depends on the size of the problem. I was only working with a
> matrix with a few tens of billions of nonzero values.
It seems extremely large sparse matrix. What kind of operation is big
enough on 5 machine ??
/Edward
On Wed, Oct 29, 2008 at 2:12 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> On Tue, Oct 28, 2008 at 5:15 PM, Edward J. Yoon <edwardyoon@apache.org>wrote:
>
>> ...
>> In single machine, as far as we
>> know graph can be stored to linked list or matrix.
>>
>
> Since the matrix is normally very sparse for large graphs, these two
> approaches are pretty similar.
>
>
>> ... So, I guess google's web graph will be stored as a matrix in a
>> bigTable.
>>
>
> I doubt that it is stored as an explicit matrix. Each page would probably
> have a big table (or file) entry and would have a list of links including
> link text.
>
>
> Have you seen my 2D block algorithm post?? 
>> http://blog.udanax.org/2008/10/parallelmatrixmultiplyonhadoop.html
>>
>
> I have now. Block decomposition for multiplies almost always applies only
> to dense matrix operations. For most sparse matrix representations
> extracting a block is only efficient if it is full width or height. For
> very sparse matrix operations, the savings due to reuse of intermediate
> results are completely dominated by the I/O cost so block decompositions are
> much less helpful.
>
> In many cases, it isn't even very helpful to send around entire rows and
> sending individual elements is about as efficient.
>
> FYI, Hama (http://incubator.apache.org/hama/) will be handled graph
>> algorithms since it is a related with adjacency matrix and topological
>> algebra. And I think 2000 node hadoop/hbase cluster is big enough if a
>> sequential/random read/write speed will be improved 800%. :)
>>
>
> I think that a 5 node cluster is big enough without any improvement in
> read/write speed.
>
> Of course, it depends on the size of the problem. I was only working with a
> matrix with a few tens of billions of nonzero values.
>

Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org
