hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Edward J. Yoon" <edwardy...@apache.org>
Subject Re: Ideal number of mappers and reducers; any physical limits?
Date Wed, 29 Oct 2008 05:58:18 GMT
> extracting a block is only efficient if it is full width or height.  For
> very sparse matrix operations, the savings due to reuse of intermediate
> results are completely dominated by the I/O cost so block decompositions are
> much less helpful.

Hmm, Yes. Thanks for your great comments. :-)

> Of course, it depends on the size of the problem.  I was only working with a
> matrix with a few tens of billions of non-zero values.

It seems extremely large sparse matrix. What kind of operation is big
enough on 5 machine ??

/Edward

On Wed, Oct 29, 2008 at 2:12 PM, Ted Dunning <ted.dunning@gmail.com> wrote:
> On Tue, Oct 28, 2008 at 5:15 PM, Edward J. Yoon <edwardyoon@apache.org>wrote:
>
>> ...
>> In single machine, as far as we
>> know graph can be stored to linked list or matrix.
>>
>
> Since the matrix is normally very sparse for large graphs, these two
> approaches are pretty similar.
>
>
>> ... So, I guess google's web graph will be stored as a matrix in a
>> bigTable.
>>
>
> I doubt that it is stored as an explicit matrix.  Each page would probably
> have a big table (or file) entry and would have a list of links including
> link text.
>
>
> Have you seen my 2D block algorithm post?? --
>> http://blog.udanax.org/2008/10/parallel-matrix-multiply-on-hadoop.html
>>
>
> I have now.  Block decomposition for multiplies almost always applies only
> to dense matrix operations.  For most sparse matrix representations
> extracting a block is only efficient if it is full width or height.  For
> very sparse matrix operations, the savings due to reuse of intermediate
> results are completely dominated by the I/O cost so block decompositions are
> much less helpful.
>
> In many cases, it isn't even very helpful to send around entire rows and
> sending individual elements is about as efficient.
>
> FYI, Hama (http://incubator.apache.org/hama/) will be handled graph
>> algorithms since it is a related with adjacency matrix and topological
>> algebra. And I think 2000 node hadoop/hbase cluster is big enough if a
>> sequential/random read/write speed will be improved 800%. :-)
>>
>
> I think that a 5 node cluster is big enough without any improvement in
> read/write speed.
>
> Of course, it depends on the size of the problem.  I was only working with a
> matrix with a few tens of billions of non-zero values.
>



-- 
Best regards, Edward J. Yoon
edwardyoon@apache.org
http://blog.udanax.org

Mime
View raw message