Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 6621 invoked from network); 17 Mar 2010 20:58:36 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 17 Mar 2010 20:58:36 -0000 Received: (qmail 67398 invoked by uid 500); 17 Mar 2010 20:58:33 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 67361 invoked by uid 500); 17 Mar 2010 20:58:33 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 67352 invoked by uid 99); 17 Mar 2010 20:58:33 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Mar 2010 20:58:33 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=FREEMAIL_FROM,SPF_NEUTRAL,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [202.165.103.48] (HELO web15903.mail.cnb.yahoo.com) (202.165.103.48) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 17 Mar 2010 20:58:26 +0000 Received: (qmail 17498 invoked by uid 60001); 17 Mar 2010 20:58:02 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com.cn; s=s1024; t=1268859482; bh=nu/UqXunmb59wyJoM+AXrmtj94pJVkdUNaAxaqhpMIM=; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=09PWUbFRGTvmKhSHGguzvfKTN0XE4Gl5J1QfV6Sj56kuAtZGhK9LSh9FaDw3gt3ZaDIerrhJ6V35wfjBV6w0ePcQzw4KNi6+Xz6i0OIOq1EdJjUvaYTB6Qe2vWj9dnR5wc6A42wXJrsQK9GPgDdFBSg7vv6jrTIEzWmSVPPwQJA= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com.cn; h=Message-ID:X-YMail-OSG:Received:X-Mailer:References:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type:Content-Transfer-Encoding; b=ZxJgoXMkYd/478iqbE5dek7je3YFHfYk6NBs6JKLj5uaBc73COTOFuRrkfe+pkIpsMhavsA5yH26WAXT8ODeSEWyZmTC9VBNNPZB7mqizRpXkVYDPTC3fCH2wxHTCm7qlbZpcDndYuvpO15lBSisLDNbZJ4Fe/eYaR4h36LZp+A=; Message-ID: <757890.14505.qm@web15903.mail.cnb.yahoo.com> X-YMail-OSG: ipb5iUsVM1mAoDvCHVYT9j9RucIN8fc8A1bJnLGFeDnOgH_ dBNC.EmgYVGP9.ozKvci04mwwRHXEHVlHKCbrRYYX3c0AUjBXx0iDPbrBdWM TjT9yEU7mNYSw48h.mBRoLjK3I5X4ZQ7JS3Xp5ADsgIVAIBlW8Uf5ZiwSi.u 1zh5qh39xu_5CaDDZrehpSW6xzWgHyFYL6fzP_Xtd2HMfiRcCF5FdTO1Vr4B dYHeQcTKmXupm7LHs9LLHdfVYpPdddBAEb.a3WJwBhuO2.TRutGWcpw6ZwlR rWf58sPKQsPixX43D Received: from [128.174.236.128] by web15903.mail.cnb.yahoo.com via HTTP; Thu, 18 Mar 2010 04:58:02 CST X-Mailer: YahooMailRC/324.3 YahooMailWebService/0.8.100.260964 References: Date: Thu, 18 Mar 2010 04:58:02 +0800 (CST) From: Gang Luo Subject: Re: when to sent distributed cache file To: common-user@hadoop.apache.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=gb2312 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Thanks Ravi.=0A=0AHere are some observations. I run job1 to generate some d= ata used by the following job2 without replication. The total size of the j= ob 1 output is 25mb and is in 50 files. I use distributed cache to sent all= the files to nodes running job2 tasks. When job2 starts, it stayed at "map= 0% reduce 0%" for 10 minutes. When the job1 output is in 10 files (using 1= 0 reducers in job1), the time consumed here are 2 minutes. =0A=0ASo, I thin= k the time to distribute cache files is actually counted as part of the tot= al time of the MR job. And in order to sent a cache file from HDFS to local= disk, it sent at least one block (64mb by default) even that file is only = 1mb. Is that right? If so, how much space that cache file takes on the loca= l disk, 64mb or 1mb? =0A=0A-Gang=0A=0A=0A=0A----- =D4=AD=CA=BC=D3=CA=BC=FE = ----=0A=B7=A2=BC=FE=C8=CB=A3=BA Ravi Phulari =0A=CA= =D5=BC=FE=C8=CB=A3=BA "common-user@hadoop.apache.org" ; Gang Luo =0A=B7=A2=CB=CD=C8=D5=C6=DA=A3= =BA 2010/3/17 (=D6=DC=C8=FD) 3:52:24 =CF=C2=CE=E7=0A=D6=F7 =CC=E2=A3=BA R= e: when to sent distributed cache file=0A=0AHello Gang,=0A The framewo= rk will copy the necessary files to the slave node before any tasks for th= e job are executed on that node.=0ANot sure if time required to distribute= cache is counted in map reduce job time but it is included in job submissi= on process in JobClient .=0A--=0ARavi=0A=0AOn 3/17/10 11:32 AM, "Gang Luo" = wrote:=0A=0AHi all,=0AI doubt when does hadoop dist= ributes the cache files. The moment we call DistributedCache.addCacheFile()= ? Will the time to distribute caches be counted as part of the mapreduce j= ob time?=0A=0AThanks,=0A-Gang=0A=0A=0A