hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tommaso Teofili <tommaso.teof...@gmail.com>
Subject Re: Matrix multiplication in Hadoop
Date Sat, 19 Nov 2011 17:28:38 GMT
I agree Hama (and BSP model) could be a good option, plus Hama also
supports MR nextgen now [1].
I know MM has been implemented with Hama in the past so it may be worth
asking on the mailing list.

My 2 cents,
Tommaso

[1] : http://svn.apache.org/repos/asf/incubator/hama/trunk/yarn/


2011/11/19 He Chen <airbots@gmail.com>

> Did you try Hama?
>
> There are may methods.
>
> 1) use Hadoop MPI which allows you use MPI MM code based on Hadoop;
>
> 2) Hama is designed for MM
>
> 3) Use pure Hadoop Java MapReduce;
>
> I did this before but may not be optimal algorithm. Put your first matrix
> in DistributedCache and take second matrix line as inputsplit. For each
> line, use a mapper to let a array multply the first matrix in
> DistributedCache. Use reducer to collect the result matrix. This algorithm
> is limited by your DistributedCache size. It is suitable for a small matrix
> to multiply a huge matrix.
>
> Chen
> On Sat, Nov 19, 2011 at 10:34 AM, Tim Broberg <Tim.Broberg@exar.com>
> wrote:
>
> > Perhaps this is a good candidate for a native library, then?
> >
> > ________________________________________
> > From: Mike Davis [xmikedavis@gmail.com]
> > Sent: Friday, November 18, 2011 7:39 PM
> > To: common-user@hadoop.apache.org
> > Subject: Re: Matrix multiplication in Hadoop
> >
> >  On Friday, November 18, 2011, Mike Spreitzer <mspreitz@us.ibm.com>
> wrote:
> > >  Why is matrix multiplication ill-suited for Hadoop?
> >
> > IMHO, a huge issue here is the JVM's inability to fully support cpu
> vendor
> > specific SIMD instructions and, by extension, optimized BLAS routines.
> > Running a large MM task using intel's MKL rather than relying on generic
> > compiler optimization is orders of magnitude faster on a single multicore
> > processor. I see almost no way that Hadoop could win such a CPU intensive
> > task against an mpi cluster with even a tenth of the nodes running with a
> > decently tuned BLAS library. Racing even against a single CPU might be
> > difficult, given the i/o overhead.
> >
> > Still, it's a reasonably common problem and we shouldn't murder the good
> in
> > favor of the best. I'm certain a MM/LinAlg Hadoop library with even
> > mediocre performance, wrt C, would get used.
> >
> > --
> > Mike Davis
> >
> > The information and any attached documents contained in this message
> > may be confidential and/or legally privileged.  The message is
> > intended solely for the addressee(s).  If you are not the intended
> > recipient, you are hereby notified that any use, dissemination, or
> > reproduction is strictly prohibited and may be unlawful.  If you are
> > not the intended recipient, please contact the sender immediately by
> > return e-mail and destroy all copies of the original message.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message