Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 96725 invoked from network); 17 Mar 2011 04:29:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Mar 2011 04:29:24 -0000 Received: (qmail 69892 invoked by uid 500); 17 Mar 2011 04:29:22 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 69747 invoked by uid 500); 17 Mar 2011 04:29:21 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 69739 invoked by uid 99); 17 Mar 2011 04:29:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Mar 2011 04:29:20 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of qwertymaniac@gmail.com designates 209.85.161.48 as permitted sender) Received: from [209.85.161.48] (HELO mail-fx0-f48.google.com) (209.85.161.48) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 17 Mar 2011 04:29:14 +0000 Received: by fxm7 with SMTP id 7so3007861fxm.35 for ; Wed, 16 Mar 2011 21:28:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=dPOBpnUNCF3f6QZFRCcnPecexsN+XzzbWVbX8CnKj4M=; b=LkmGx2m8g5FJBgidj9R0+hh69M3OMJ5qZtPCEAifll4x/ZXvUIlNbY8ivlLAp8I0iZ xKfyZujrIzBPO4MnbVU3ILahX2Nafa8QXFHIiAAq63wjJrkjjinWuXc2Zr0RbaYMx84Q HuqTyKsBJAruayOTs7wyETwsr6vlhSOTtNMx8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=odizTaupzHLLo88iBqr1ujo5IyGGflxggpSq+UPbTxHtuBgwwTn/CEGu8ImI7XP3IW X7Bjl5r5c3YTp7W4THmb2BxLMlFaTbTCNTYqri+4wRDDAPQH7QrRcXuOw5L5JCgUn/VQ LUH+WzqJBysnzGonP0tiFKu5sE0yKimdIt0M8= Received: by 10.223.111.14 with SMTP id q14mr833187fap.78.1300336134124; Wed, 16 Mar 2011 21:28:54 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.123.139 with HTTP; Wed, 16 Mar 2011 21:28:34 -0700 (PDT) In-Reply-To: References: From: Harsh J Date: Thu, 17 Mar 2011 09:58:34 +0530 Message-ID: Subject: Re: How does sqoop distribute it's data evenly across HDFS? To: common-user@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org There's a balancer available to re-balance DNs across the HDFS cluster in general. It is available in the $HADOOP_HOME/bin/ directory as start-balancer.sh But what I think sqoop implies is that your data is balanced due to the map jobs it runs for imports (using a provided split factor between maps), which should make it write chunks of data out to different DataNodes. I guess you could get more information on the Sqoop mailing list sqoop-user@cloudera.org, https://groups.google.com/a/cloudera.org/group/sqoop-user/topics On Thu, Mar 17, 2011 at 5:04 AM, BeThere wrote: > The sqoop documentation seems to imply that it uses the key information p= rovided to it on the command line to ensure that the SQL data is distribute= d evenly across the DFS. However I cannot see any mechanism for achieving t= his explicitly other than relying on the implicit distribution provided by = default by HDFS. Is this correct or are there methods on some API that allo= w me to manage the distribution to ensure that it is balanced across all no= des in my cluster? > > Thanks, > > =A0 =A0 =A0 =A0 Andy D > > --=20 Harsh J http://harshj.com