Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 23479 invoked from network); 6 Jul 2010 09:13:08 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 6 Jul 2010 09:13:08 -0000 Received: (qmail 30471 invoked by uid 500); 6 Jul 2010 09:13:07 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 30227 invoked by uid 500); 6 Jul 2010 09:13:04 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 30219 invoked by uid 99); 6 Jul 2010 09:13:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Jul 2010 09:13:03 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yhemanth@gmail.com designates 74.125.83.176 as permitted sender) Received: from [74.125.83.176] (HELO mail-pv0-f176.google.com) (74.125.83.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Jul 2010 09:12:55 +0000 Received: by pvc21 with SMTP id 21so643264pvc.35 for ; Tue, 06 Jul 2010 02:12:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=VpwNb6eyEP2O+GXnbaQjVPTS+zxn38gU1vYWDODBb9Y=; b=Pw3oGWrprgg/Xv5DJ9XUVQEO87E6mI5ZbkyoQs4Mj46O1eoKv2DKhxXY/FGG9yCTFM Y5aaLF114zi6Kyk+kDCb0FASV6lGSxv04AWBqhzlPZVRV2/8HvxSfDbJZTq2Xwj6r7oq vt26ePFkCki6NCsciabnWLL0wqL99jUubLGGM= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=w/sjpHVmuSZCFxoFHlnh3i3ri1pxxbgDYUdERMIRpo/0HZvhKQcI+X+6ZL4mwp3Mnb 1ZekBvOQlGes2GrKI3UFGed6mV+fC7MTWHJa3v23FtpBEF+RHgzB9zdpA5WkMrvCou7J 90WO6rZ+Y/0wOEWaJ59UOiXb57aPa6B3w16No= MIME-Version: 1.0 Received: by 10.142.232.13 with SMTP id e13mr4905500wfh.196.1278407553735; Tue, 06 Jul 2010 02:12:33 -0700 (PDT) Received: by 10.142.188.19 with HTTP; Tue, 6 Jul 2010 02:12:33 -0700 (PDT) In-Reply-To: References: Date: Tue, 6 Jul 2010 14:42:33 +0530 Message-ID: Subject: Re: Distributed Cache From: Hemanth Yamijala To: general@hadoop.apache.org Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi, > From the book: "Hadoop The definitive guide" -- P242 >>> > When you launch a job, Hadoop copies the files specified by the -files an= d > -archives options to the jobtracker=92s filesystem (normally HDFS). Then, > before a task > is run, the tasktracker copies the files from the jobtracker=92s filesyst= em to > a local disk=97 > the cache=97so the task can access the files. >>> > > I wonder why hadoop wants to copy the files to jobtracker's filesystem. > Since it is already in HDFS, it should be available to tasks. > Any considerations? Unlike input data files for M/R tasks, -files and -archives are options to copy additional files (like any configuration files etc) that all the M/R tasks might need when running. Such files typically need to be transferred from the local machine where the job is launched to the cluster nodes where the tasks run. Think of them as convenient shortcuts to distribute files to all the tasks. Makes sense ? Thanks Hemanth