Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 84812 invoked from network); 9 Aug 2010 09:38:10 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 9 Aug 2010 09:38:10 -0000 Received: (qmail 77587 invoked by uid 500); 9 Aug 2010 09:38:08 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 77110 invoked by uid 500); 9 Aug 2010 09:38:04 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 77096 invoked by uid 99); 9 Aug 2010 09:38:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Aug 2010 09:38:02 +0000 X-ASF-Spam-Status: No, hits=-1.6 required=10.0 tests=RCVD_IN_DNSWL_MED,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [192.6.10.60] (HELO tobor.hpl.hp.com) (192.6.10.60) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Aug 2010 09:37:54 +0000 Received: from localhost (localhost [127.0.0.1]) by tobor.hpl.hp.com (Postfix) with ESMTP id DF504B7C73 for ; Mon, 9 Aug 2010 10:37:32 +0100 (BST) X-Virus-Scanned: amavisd-new at hplb.hpl.hp.com Received: from tobor.hpl.hp.com ([127.0.0.1]) by localhost (tobor.hpl.hp.com [127.0.0.1]) (amavisd-new, port 10024) with LMTP id pT9Kp1Kz8itI for ; Mon, 9 Aug 2010 10:37:32 +0100 (BST) Received: from 0-imap-br1.hpl.hp.com (0-imap-br1.hpl.hp.com [16.25.144.60]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by tobor.hpl.hp.com (Postfix) with ESMTPS id 28454B7C70 for ; Mon, 9 Aug 2010 10:37:31 +0100 (BST) MailScanner-NULL-Check: 1281951439.50656@TgtzJiSnBztFRL/ecm5TZw Received: from [16.25.175.158] (morzine.hpl.hp.com [16.25.175.158]) by 0-imap-br1.hpl.hp.com (8.14.1/8.13.4) with ESMTP id o799bIY0024156 for ; Mon, 9 Aug 2010 10:37:18 +0100 (BST) Message-ID: <4C5FCC4E.8050908@apache.org> Date: Mon, 09 Aug 2010 10:37:18 +0100 From: Steve Loughran User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.11) Gecko/20100713 Thunderbird/3.0.6 MIME-Version: 1.0 To: common-user@hadoop.apache.org Subject: Re: hdfs space problem. References: <260679.31462.qm@web33504.mail.mud.yahoo.com> <002301cb34c8$527f5050$f77df0f0$@edu> <894369.15434.qm@web33506.mail.mud.yahoo.com> In-Reply-To: <894369.15434.qm@web33506.mail.mud.yahoo.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-HPL-MailScanner-Information: Please contact the ISP for more information X-MailScanner-ID: o799bIY0024156 X-HPL-MailScanner: Found to be clean X-HPL-MailScanner-From: stevel@apache.org On 05/08/10 19:28, Raj V wrote: > Thank you. I realized that I was running the datanode on the namenode and > stopped it, but did not know that the first copy went to the local node. > > Raj It's a placement decision that makes sense for code running as MR jobs, ensuring that the output of work goes to the local machine and not somewhere random, but on big imports like your's you get penalised. Some datacentres have one or two IO nodes in the cluster that aren't running hadoop HDFS or task trackers, but let you get at the data at full datacentre rates, just to help with these kind of problems. Otherwies, if you can implement your import as a MapReduce job, Hadoop can do the work for you -steve