Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@hadoop.apache.org
Received-SPF: pass (athena.apache.org: domain of eltonsky9404@gmail.com
 designates 74.125.82.176 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:date:message-id:subject:from:to:content-type;
        b=ZWd2vr5ezHFlHe7NLq9gUZNwBqpP3yUNMQ2bwYYKZDZHz7yox67JYqtTL18g54Zt/Q
         6OkZ94rxJ0TgeD84rCshqpXpIw8G22iRiIXoIpZdxAzT9Seg5lRuF3qx/Xuhuy5GYJjm
         09BZUdZbPJ9sbUgLeAivbjWNH5tsFvBIImSZk=
MIME-Version: 1.0
Date: Sat, 30 Apr 2011 17:18:34 +1000
Message-ID: <BANLkTinbKUikZGx_c9h6oN=_yveF8j2K6Q@mail.gmail.com>
Subject: questions about hadoop map reduce and compute intensive related
 applications
From: elton sky <eltonsky9404@gmail.com>
To: common-user <common-user@hadoop.apache.org>, general@hadoop.apache.org
Content-Type: multipart/alternative; boundary=000e0cd308ac270b9104a21d9917

--000e0cd308ac270b9104a21d9917
Content-Type: text/plain; charset=ISO-8859-1

I got 2 questions:

1. I am wondering how hadoop MR performs when it runs compute intensive
applications, e.g. Monte carlo method compute PI. There's a example in 0.21,
QuasiMonteCarlo, but that example doesn't use random number and it generates
psudo input upfront. If we use distributed random number generation, then I
guess the performance of hadoop should be similar with some message passing
framework, like MPI. So my guess is by using proper method hadoop would be
good in compute intensive applications compared with MPI.

2. I am looking for some applications, which has large data sets and
requires intensive computation. An application can be divided into a
workflow, including either map reduce operations, and message passing like
operations. For example, in step 1 I use hadoop MR processes 10TB of data
and generates small output, say, 10GB. This 10GB can be fit into memory and
they are better be processed with some interprocess communication, which
will boost the performance. So in step 2 I will use MPI, etc.

Is there any application has this property, perhaps in some scientific
research area? Or it's just alright to use map reduce itself?

Regards,
Elton

--000e0cd308ac270b9104a21d9917--