Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 34870 invoked from network); 28 Sep 2009 17:37:19 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Sep 2009 17:37:19 -0000 Received: (qmail 57597 invoked by uid 500); 28 Sep 2009 17:37:17 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 57516 invoked by uid 500); 28 Sep 2009 17:37:16 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 57506 invoked by uid 99); 28 Sep 2009 17:37:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Sep 2009 17:37:16 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.217.226] (HELO mail-gx0-f226.google.com) (209.85.217.226) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 28 Sep 2009 17:37:07 +0000 Received: by gxk26 with SMTP id 26so2485283gxk.11 for ; Mon, 28 Sep 2009 10:36:45 -0700 (PDT) MIME-Version: 1.0 Received: by 10.91.55.14 with SMTP id h14mr3058374agk.67.1254159405147; Mon, 28 Sep 2009 10:36:45 -0700 (PDT) In-Reply-To: References: From: Todd Lipcon Date: Mon, 28 Sep 2009 10:36:25 -0700 Message-ID: <45f85f70909281036o58e86e05v80b9677b37a3751@mail.gmail.com> Subject: Re: Where are temp files stored? To: common-user@hadoop.apache.org Content-Type: multipart/alternative; boundary=001485f7c820d5e81c0474a6bdee X-Virus-Checked: Checked by ClamAV on apache.org --001485f7c820d5e81c0474a6bdee Content-Type: text/plain; charset=ISO-8859-1 On Sun, Sep 27, 2009 at 7:39 PM, Starry SHI wrote: > Hi Dave. > > Thank you for your reply! > > I have checked {dfs.data.dir}/tmp, the tmp files are there while the job is > running. However, it seems that the tmp files on each node are the same. > That is to say, the whole HDFS is sharing the same tmp files. This looks > strange, because each node shou > ld process its own part of data. Do you have > some ideas on this point? > The mapreduce intermediate data is stored in mapred.local.dir. The default value for this is hadoop.tmp.dir/mapred/local. Note that it is cleaned up after jobs finish executing. -Todd > /* Tomorrow is another day. So is today. */ > > > On Sat, Sep 26, 2009 at 15:07, dave bayer wrote: > > > > > On Sep 25, 2009, at 11:34 PM, Starry SHI wrote: > > > > Hi. > >> > >> I am wondering where the temp files (intermediate files) are stored. > They > >> should be located in the hadoop.tmp.dir by default, right? why I cannot > >> find > >> them in either the local file system and hdfs? > >> > > > > You might look under ${dfs.data.dir}/tmp. Granted, I've not consulted the > > code to verify that is how the path is built, but that is where I've seen > > them on my cluster... > > > > Another question is about the replication of the intermediate files. By > >> default, will the intermediate (tmp) files be written to HDFS? > >> > > > > No, they live on the node that processed the map task. You wouldn't > > want to spend the cycles/time to do multiple replication of this data out > > to other nodes (and then cleanup it up) when you can rerun the task if > > the node holding the data happens to go down (unlikely). > > > > dave bayer > > > --001485f7c820d5e81c0474a6bdee--