hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From elton sky <eltonsky9...@gmail.com>
Subject questions about hadoop map reduce and compute intensive related applications
Date Sat, 30 Apr 2011 07:18:34 GMT
I got 2 questions:

1. I am wondering how hadoop MR performs when it runs compute intensive
applications, e.g. Monte carlo method compute PI. There's a example in 0.21,
QuasiMonteCarlo, but that example doesn't use random number and it generates
psudo input upfront. If we use distributed random number generation, then I
guess the performance of hadoop should be similar with some message passing
framework, like MPI. So my guess is by using proper method hadoop would be
good in compute intensive applications compared with MPI.

2. I am looking for some applications, which has large data sets and
requires intensive computation. An application can be divided into a
workflow, including either map reduce operations, and message passing like
operations. For example, in step 1 I use hadoop MR processes 10TB of data
and generates small output, say, 10GB. This 10GB can be fit into memory and
they are better be processed with some interprocess communication, which
will boost the performance. So in step 2 I will use MPI, etc.

Is there any application has this property, perhaps in some scientific
research area? Or it's just alright to use map reduce itself?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message