Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9D811801A for ; Tue, 13 Sep 2011 19:23:03 +0000 (UTC) Received: (qmail 62322 invoked by uid 500); 13 Sep 2011 19:23:00 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 62116 invoked by uid 500); 13 Sep 2011 19:23:00 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 62061 invoked by uid 99); 13 Sep 2011 19:22:58 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Sep 2011 19:22:58 +0000 X-ASF-Spam-Status: No, hits=3.3 required=5.0 tests=HTML_MESSAGE,NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 216.145.54.173 is neither permitted nor denied by domain of evans@yahoo-inc.com) Received: from [216.145.54.173] (HELO mrout3.yahoo.com) (216.145.54.173) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Sep 2011 19:22:50 +0000 Received: from SP1-EX07CAS02.ds.corp.yahoo.com (sp1-ex07cas02.ds.corp.yahoo.com [216.252.116.138]) by mrout3.yahoo.com (8.14.4/8.14.4/y.out) with ESMTP id p8DJMF4A028283 for ; Tue, 13 Sep 2011 12:22:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=yahoo-inc.com; s=cobra; t=1315941735; bh=OTqMAsTbQqUdfFVVobRML31a/QZ5voWNY4adlvhjp4Y=; h=From:To:Date:Subject:Message-ID:In-Reply-To:Content-Type: MIME-Version; b=jTGgFg6Xs5USTXApHnO4Aa6A7nQMgQvtF2Yg/Va0Wf4l4hSSjM2Sgd7sAMANaXUJx i/mjiKD78/l4P3h8hdqCwivge5cuTj6NYo85TPlH3/6ZAoU8CJH4eKE7kZJDeGOsCM BsRjgUBGLZfE6pGIOibPjGKcDGPxgSozXVb0d1BQ= Received: from SP1-EX07VS02.ds.corp.yahoo.com ([216.252.116.135]) by SP1-EX07CAS02.ds.corp.yahoo.com ([216.252.116.167]) with mapi; Tue, 13 Sep 2011 12:22:15 -0700 From: Robert Evans To: "common-user@hadoop.apache.org" Date: Tue, 13 Sep 2011 12:22:12 -0700 Subject: Re: Is Hadoop the right platform for my HPC application? Thread-Topic: Is Hadoop the right platform for my HPC application? Thread-Index: AcxyQ50ka1cp8mmjTvStuRCW9yM8FwABtLpL Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_CA95179428FDEevansyahooinccom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_CA95179428FDEevansyahooinccom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Another option to think about is that there is a Hamster project ( MAPREDUC= E-2911 ) that will a= llow OpenMPI to run on a Hadoop Cluster. It is still very preliminary and = will probably not be ready until Hadoop 0.23 or 0.24. There are other processing methodologies being developed to run on top of Y= ARN (Which is the resource scheduler put in as part of Hadoop 0.23) http://= wiki.apache.org/hadoop/PoweredByYarn So there are even more choices coming depending on your problem. --Bobby Evans On 9/13/11 12:54 PM, "Parker Jones" wrote: Thank you for the explanations, Bobby. That helps significantly. I also read the article below which gave me a better understanding of the r= elative merits of MapReduce/Hadoop vs MPI. Alberto, you might find it usef= ul too. http://grids.ucs.indiana.edu/ptliupages/publications/CloudsandMR.pdf There is even a MapReduce API built on top of MPI developed at Sandia. So many options to choose from :-) Cheers, Parker > From: evans@yahoo-inc.com > To: common-user@hadoop.apache.org > Date: Mon, 12 Sep 2011 14:02:44 -0700 > Subject: Re: Is Hadoop the right platform for my HPC application? > > Parker, > > The hadoop command itself is just a shell script that sets up your classp= ath and some environment variables for a JVM. Hadoop provides a java API a= nd you should be able to use to write you application, without dealing with= the command line. That being said there is no Map/Reduce C/C++ API. Ther= e is libhdfs.so that will allow you to read/write HDFS files from a C/C++ p= rogram, but it actually launches a JVM behind the scenes to handle the actu= al requests. > > As for a way to avoid writing your input data into files, the data has to= be distributed to the compute nodes some how. You could write a custom in= put format that does not use any input files, and then have it load the dat= a a different way. I believe that some people do this to load data from My= SQL or some other DB for processing. Similarly you could do something with= the output format to put the data someplace else. > > It is hard to say if Hadoop is the right platform without more informatio= n about what you are doing. Hadoop has been used for lots of embarrassingl= y parallel problems. The processing is easy, the real question is where is= your data coming from, and where are the results going. Map/Reduce is fas= t in part because it tries to reduce data movement and move the computation= to the data, not the other way round. Without knowing the expected size o= f your data or the amount of processing that it will do, it is hard to say. > > --Bobby Evans > > On 9/12/11 5:09 AM, "Parker Jones" wrote: > > > > Hello all, > > I have Hadoop up and running and an embarrassingly parallel problem but c= an't figure out how to arrange the problem. My apologies in advance if thi= s is obvious and I'm not getting it. > > My HPC application isn't a batch program, but runs in a continuous loop (= like a server) *outside* of the Hadoop machines, and it should occasionally= farm out a large computation to Hadoop and use the results. However, all = the examples I have come across interact with Hadoop via files and the comm= and line. (Perhaps I am looking at the wrong places?) > > So, > * is Hadoop the right platform for this kind of problem? > * is it possible to use Hadoop without going through the command line and= writing all input data to files? > > If so, could someone point me to some examples and documentation. I am c= oding in C/C++ in case that is relevant, but examples in any language shoul= d be helpful. > > Thanks for any suggestions, > Parker > > > --_000_CA95179428FDEevansyahooinccom_--