Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: 216.145.54.173 is neither permitted
 nor denied by domain of evans@yahoo-inc.com)
From: Robert Evans <evans@yahoo-inc.com>
To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
Date: Tue, 13 Sep 2011 12:22:12 -0700
Subject: Re: Is Hadoop the right platform for my HPC application?
Thread-Topic: Is Hadoop the right platform for my HPC application?
Thread-Index: AcxyQ50ka1cp8mmjTvStuRCW9yM8FwABtLpL
Message-ID: <CA951794.28FDE%evans@yahoo-inc.com>
In-Reply-To: <COL123-W1567E9FD52FCF8857C36A2C0050@phx.gbl>
Accept-Language: en-US
Content-Language: en
acceptlanguage: en-US
Content-Type: multipart/alternative;
	boundary="_000_CA95179428FDEevansyahooinccom_"
MIME-Version: 1.0

--_000_CA95179428FDEevansyahooinccom_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Another option to think about is that there is a Hamster project ( MAPREDUC=
E-2911 <https://issues.apache.org/jira/browse/MAPREDUCE-2911> ) that will a=
llow OpenMPI to run on a Hadoop Cluster.  It is still very preliminary and =
will probably not be ready until Hadoop 0.23 or 0.24.

There are other processing methodologies being developed to run on top of Y=
ARN (Which is the resource scheduler put in as part of Hadoop 0.23) http://=
wiki.apache.org/hadoop/PoweredByYarn

So there are even more choices coming depending on your problem.

--Bobby Evans

On 9/13/11 12:54 PM, "Parker Jones" <zoubidoo@hotmail.com> wrote:


Thank you for the explanations, Bobby.  That helps significantly.

I also read the article below which gave me a better understanding of the r=
elative merits of MapReduce/Hadoop vs MPI.  Alberto, you might find it usef=
ul too.
http://grids.ucs.indiana.edu/ptliupages/publications/CloudsandMR.pdf

There is even a MapReduce API built on top of MPI developed at Sandia.

So many options to choose from :-)

Cheers,
Parker

> From: evans@yahoo-inc.com
> To: common-user@hadoop.apache.org
> Date: Mon, 12 Sep 2011 14:02:44 -0700
> Subject: Re: Is Hadoop the right platform for my HPC application?
>
> Parker,
>
> The hadoop command itself is just a shell script that sets up your classp=
ath and some environment variables for a JVM.  Hadoop provides a java API a=
nd you should be able to use to write you application, without dealing with=
 the command line.  That being said there is no Map/Reduce C/C++ API.  Ther=
e is libhdfs.so that will allow you to read/write HDFS files from a C/C++ p=
rogram, but it actually launches a JVM behind the scenes to handle the actu=
al requests.
>
> As for a way to avoid writing your input data into files, the data has to=
 be distributed to the compute nodes some how.  You could write a custom in=
put format that does not use any input files, and then have it load the dat=
a a different way.  I believe that some people do this to load data from My=
SQL or some other DB for processing.  Similarly you could do something with=
 the output format to put the data someplace else.
>
> It is hard to say if Hadoop is the right platform without more informatio=
n about what you are doing.  Hadoop has been used for lots of embarrassingl=
y parallel problems.  The processing is easy, the real question is where is=
 your data coming from, and where are the results going.  Map/Reduce is fas=
t in part because it tries to reduce data movement and move the computation=
 to the data, not the other way round.  Without knowing the expected size o=
f your data or the amount of processing that it will do, it is hard to say.
>
> --Bobby Evans
>
> On 9/12/11 5:09 AM, "Parker Jones" <zoubidoo@hotmail.com> wrote:
>
>
>
> Hello all,
>
> I have Hadoop up and running and an embarrassingly parallel problem but c=
an't figure out how to arrange the problem.  My apologies in advance if thi=
s is obvious and I'm not getting it.
>
> My HPC application isn't a batch program, but runs in a continuous loop (=
like a server) *outside* of the Hadoop machines, and it should occasionally=
 farm out a large computation to Hadoop and use the results.  However, all =
the examples I have come across interact with Hadoop via files and the comm=
and line.  (Perhaps I am looking at the wrong places?)
>
> So,
> * is Hadoop the right platform for this kind of problem?
> * is it possible to use Hadoop without going through the command line and=
 writing all input data to files?
>
> If so, could someone point me to some examples and documentation.  I am c=
oding in C/C++ in case that is relevant, but examples in any language shoul=
d be helpful.
>
> Thanks for any suggestions,
> Parker
>
>
>


--_000_CA95179428FDEevansyahooinccom_--