Return-Path: X-Original-To: apmail-hadoop-general-archive@minotaur.apache.org Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 45E6737EA for ; Sat, 30 Apr 2011 07:19:18 +0000 (UTC) Received: (qmail 88433 invoked by uid 500); 30 Apr 2011 07:19:06 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 87930 invoked by uid 500); 30 Apr 2011 07:19:03 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 87913 invoked by uid 99); 30 Apr 2011 07:19:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Apr 2011 07:19:02 +0000 X-ASF-Spam-Status: No, hits=3.7 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,RFC_ABUSE_POST,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of eltonsky9404@gmail.com designates 74.125.82.176 as permitted sender) Received: from [74.125.82.176] (HELO mail-wy0-f176.google.com) (74.125.82.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 30 Apr 2011 07:18:55 +0000 Received: by wyb40 with SMTP id 40so5238346wyb.35 for ; Sat, 30 Apr 2011 00:18:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:date:message-id:subject:from:to :content-type; bh=gtaDLoCD5IGUFLtKLhXAM/H1FJH2zLHQaLMtKr/iQuI=; b=X74LpE9EYjK4TjQtS/IrOVPvIQB/MtrRYjVtq1gcL96gZN2xc2ojQvZDwPe7kvXX28 yZNjCEawlST5EtDnRi5N93GDu7xsJ9T8Rfwh8Q42bqGcYEhpwDuBoMWpQuTtDWF7jutB GnsYWCe1ajVVO6JDoMXpBe64ueIBThOWZgF9Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; b=ZWd2vr5ezHFlHe7NLq9gUZNwBqpP3yUNMQ2bwYYKZDZHz7yox67JYqtTL18g54Zt/Q 6OkZ94rxJ0TgeD84rCshqpXpIw8G22iRiIXoIpZdxAzT9Seg5lRuF3qx/Xuhuy5GYJjm 09BZUdZbPJ9sbUgLeAivbjWNH5tsFvBIImSZk= MIME-Version: 1.0 Received: by 10.227.100.219 with SMTP id z27mr1764308wbn.45.1304147914066; Sat, 30 Apr 2011 00:18:34 -0700 (PDT) Received: by 10.227.42.75 with HTTP; Sat, 30 Apr 2011 00:18:34 -0700 (PDT) Date: Sat, 30 Apr 2011 17:18:34 +1000 Message-ID: Subject: questions about hadoop map reduce and compute intensive related applications From: elton sky To: common-user , general@hadoop.apache.org Content-Type: multipart/alternative; boundary=000e0cd308ac270b9104a21d9917 --000e0cd308ac270b9104a21d9917 Content-Type: text/plain; charset=ISO-8859-1 I got 2 questions: 1. I am wondering how hadoop MR performs when it runs compute intensive applications, e.g. Monte carlo method compute PI. There's a example in 0.21, QuasiMonteCarlo, but that example doesn't use random number and it generates psudo input upfront. If we use distributed random number generation, then I guess the performance of hadoop should be similar with some message passing framework, like MPI. So my guess is by using proper method hadoop would be good in compute intensive applications compared with MPI. 2. I am looking for some applications, which has large data sets and requires intensive computation. An application can be divided into a workflow, including either map reduce operations, and message passing like operations. For example, in step 1 I use hadoop MR processes 10TB of data and generates small output, say, 10GB. This 10GB can be fit into memory and they are better be processed with some interprocess communication, which will boost the performance. So in step 2 I will use MPI, etc. Is there any application has this property, perhaps in some scientific research area? Or it's just alright to use map reduce itself? Regards, Elton --000e0cd308ac270b9104a21d9917--