Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 96009 invoked from network); 3 Aug 2010 04:34:34 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 3 Aug 2010 04:34:34 -0000 Received: (qmail 47077 invoked by uid 500); 3 Aug 2010 04:34:32 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 46757 invoked by uid 500); 3 Aug 2010 04:34:29 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 46743 invoked by uid 99); 3 Aug 2010 04:34:28 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Aug 2010 04:34:28 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yhemanth@gmail.com designates 209.85.212.48 as permitted sender) Received: from [209.85.212.48] (HELO mail-vw0-f48.google.com) (209.85.212.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 03 Aug 2010 04:34:20 +0000 Received: by vws2 with SMTP id 2so4018142vws.35 for ; Mon, 02 Aug 2010 21:33:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type :content-transfer-encoding; bh=U7ESsQ1xS63HzoF3jPjnH7YWGNXhXemGPpl92U8Wx0M=; b=K71Q6uU8AMyCj/PUzonPVir13B0lHxEKboim7/nYD6+JXVxiL7n5kTF2ZWswF0BddG dHVYbodyhk/tpCUWUCb9AoQULCNBkVMHZgnRn4f0qvX03mcghpoXs2npN5F1V1kr1Lmf n3i0cYsBCqNLjOhXS5ewMYhdnkpwnXrRmIbMI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=LpjhThA+fXRtm75d2isDRNa+8uJLtvk0xeLm0trTe3nInJHG+oPf81grJ0et8KuQsr ZOYtWE+m5Nt/K51nl3a0LbwZQWooXd7m9yBX3jC1hPwK7l+dHx1E9ly6Pne0cdujOF1p cIMbx5ZYrRI/dxYPhnDYvpDuIjIuBeAHoVMMY= MIME-Version: 1.0 Received: by 10.220.49.212 with SMTP id w20mr4840002vcf.246.1280810035271; Mon, 02 Aug 2010 21:33:55 -0700 (PDT) Received: by 10.220.176.2 with HTTP; Mon, 2 Aug 2010 21:33:55 -0700 (PDT) In-Reply-To: <361065.10321.qm@web15907.mail.cnb.yahoo.com> References: <193354.45428.qm@web15903.mail.cnb.yahoo.com> <572943.58812.qm@web15908.mail.cnb.yahoo.com> <361065.10321.qm@web15907.mail.cnb.yahoo.com> Date: Tue, 3 Aug 2010 10:03:55 +0530 Message-ID: Subject: Re: reuse cached files From: Hemanth Yamijala To: common-user@hadoop.apache.org Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi, > I am actually doing some test to see the performance. I want to eliminate= the > interference of distributed cache. I find there is method in the api to p= urge > the cache. That might be what I want. So, you want to run multiple versions of a job (possibly different job parameters) and measure them relatively. Is that correct ? I can think of some options: - Is it possible, not to use distributed cache at all ? You could possibly bundle the files along with the job jar. - You could run the job on fresh cluster instances (a more costly option, nevertheless) - You could change the timestamps of the distributed cache files on DFS somehow before each invocation of the job. This will make Hadoop believe that the files have been changed, and this will cause distributed cache to fetch the files again. The purgeCache API you are seeing is very mapreduce framework specific. This is *not* to be used by client code, and is not guaranteed to work. In the latter versions of Hadoop (0.21 and trunk), these methods have been deprecated in the public API and will be removed altogether. Thanks hemanth > > Thanks, > -Gang > > > > ----- =D4=AD=CA=BC=D3=CA=BC=FE ---- > =B7=A2=BC=FE=C8=CB=A3=BA Hemanth Yamijala > =CA=D5=BC=FE=C8=CB=A3=BA common-user@hadoop.apache.org > =B7=A2=CB=CD=C8=D5=C6=DA=A3=BA 2010/8/2 (=D6=DC=D2=BB) 12:56:25 =C9=CF=CE= =E7 > =D6=F7 =CC=E2=A3=BA Re: reuse cached files > > Hi, > >> Thanks Hemanth. Is there any way to invalidate the reuse and ask Hadoop = to >> resent exactly the same files to cache for every job? > > I may be able to answer this better if I understand the use case. If > you need the same files for every job, why would you need to send them > afresh each time ? If something is cached, it can be reused, no ? I am > sure I must be missing something in your requirement ... > > Thanks > Hemanth > > > > >