hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <qwertyman...@gmail.com>
Subject Re: How does sqoop distribute it's data evenly across HDFS?
Date Thu, 17 Mar 2011 04:28:34 GMT
There's a balancer available to re-balance DNs across the HDFS cluster
in general. It is available in the $HADOOP_HOME/bin/ directory as
start-balancer.sh

But what I think sqoop implies is that your data is balanced due to
the map jobs it runs for imports (using a provided split factor
between maps), which should make it write chunks of data out to
different DataNodes.

I guess you could get more information on the Sqoop mailing list
sqoop-user@cloudera.org,
https://groups.google.com/a/cloudera.org/group/sqoop-user/topics

On Thu, Mar 17, 2011 at 5:04 AM, BeThere <andy@doddington.net> wrote:
> The sqoop documentation seems to imply that it uses the key information provided to it
on the command line to ensure that the SQL data is distributed evenly across the DFS. However
I cannot see any mechanism for achieving this explicitly other than relying on the implicit
distribution provided by default by HDFS. Is this correct or are there methods on some API
that allow me to manage the distribution to ensure that it is balanced across all nodes in
my cluster?
>
> Thanks,
>
>         Andy D
>
>



-- 
Harsh J
http://harshj.com

Mime
View raw message