Return-Path: Delivered-To: apmail-hadoop-core-user-archive@www.apache.org Received: (qmail 43776 invoked from network); 1 Jun 2008 17:14:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 1 Jun 2008 17:14:03 -0000 Received: (qmail 60555 invoked by uid 500); 1 Jun 2008 17:14:02 -0000 Delivered-To: apmail-hadoop-core-user-archive@hadoop.apache.org Received: (qmail 60514 invoked by uid 500); 1 Jun 2008 17:14:02 -0000 Mailing-List: contact core-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-user@hadoop.apache.org Delivered-To: mailing list core-user@hadoop.apache.org Received: (qmail 60503 invoked by uid 99); 1 Jun 2008 17:14:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Jun 2008 10:14:02 -0700 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ted.dunning@gmail.com designates 209.85.198.233 as permitted sender) Received: from [209.85.198.233] (HELO rv-out-0506.google.com) (209.85.198.233) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 01 Jun 2008 17:13:14 +0000 Received: by rv-out-0506.google.com with SMTP id k40so955130rvb.29 for ; Sun, 01 Jun 2008 10:13:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; bh=EvDWy4Y4XOoqn6LkTT1Qa46ohXPKr7yBhaEDMaU8gFY=; b=JH0fNIw4lsnwD/D8v0BWr6c3hJVOe16RQfMVaKmYIVRZyW6sdyoAhdUbiUKu2zezKrOSLTT52nWO49LQdteQAVxbtTraqBpCE54SO7igU4OeBzWrzhoWmJFjEGs0vr2NwC8U4uT7rMkN/kDTP1XURsdWN0zmbBp9QazpWIRC7Qw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=iFCK3yfhyRDXfmjE5X1IHy9UE3IfwE7mJ/qBBDpRun8EZcdde+3RcoBfJtuGtPzDcERxVIX7+xvbT7nOZASW7J7bG2WTgs7I0JwkOExXDuA6HUn0df4wyy+Gdl7qOucw4TKJM75UOpXJSpetx5/ADih49oa/Dc4CcFfj5Mh3kds= Received: by 10.140.201.15 with SMTP id y15mr4408946rvf.33.1212340410629; Sun, 01 Jun 2008 10:13:30 -0700 (PDT) Received: by 10.141.180.4 with HTTP; Sun, 1 Jun 2008 10:13:30 -0700 (PDT) Message-ID: Date: Sun, 1 Jun 2008 10:13:30 -0700 From: "Ted Dunning" To: core-user@hadoop.apache.org Subject: Re: In memory Map Reduce In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_2653_25870455.1212340410644" References: X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_2653_25870455.1212340410644 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Content-Disposition: inline Hadoop goes to some lengths to make sure that things can stay in memory as much as possible. There are still cases, however, where intermediate results are normally written to disk. That means that implementors will have those time scales in their head as they do things which will inevitably make the trade-offs somewhat poor compared to a system that never envisions intermediate data being written to disk. But other than guessing like this, I couldn't actually say how it would turn out except that for very short jobs, moving jar files around and other startup costs can be the dominant cost. On Sun, Jun 1, 2008 at 5:05 AM, Martin Jaggi wrote: > > So in the case that all intermediate pairs fit into the RAM of the cluster, > does the InMemoryFileSystem already allow the intermediate phase to be done > without much disk access? Or what would be the current bottleneck in Hadoop > in this scenario (huge computational load, not so much data in/out) > according to your opinion? > > > ------=_Part_2653_25870455.1212340410644--