hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Pivovarov <apivova...@gmail.com>
Subject Re: Copying many files to HDFS
Date Tue, 17 Feb 2015 05:32:14 GMT
Hi Kevin,

What is network throughput btw
1. NFS server and client machine?
2. client machine and dananodes?


On Feb 13, 2015 5:29 AM, "Kevin" <kevin.macksamie@gmail.com> wrote:

> Hi,
> I am setting up a Hadoop cluster (CDH5.1.3) and I need to copy a thousand
> or so files into HDFS, which totals roughly 1 TB. The cluster will be
> isolated on its own private LAN with a single client machine that is
> connected to the Hadoop cluster as well as the public network. The data
> that needs to be copied into HDFS is mounted as an NFS on the client
> machine.
> I can run `hadoop fs -put` concurrently on the client machine to try and
> increase the throughput.
> If these files were able to be accessed by each node in the Hadoop
> cluster, then I could write a MapReduce job to copy a number of files from
> the network into HDFS. I could not find anything in the documentation
> saying that `distcp` works with locally hosted files (its code in the tools
> package doesn't tell any sign of it either) - but I wouldn't expect it to.
> In general, are there any other ways of copying a very large number of
> client-local files to HDFS? I search the mail archives to find a similar
> question and I didn't come across one. I'm sorry if this is a duplicate
> question.
> Thanks for your time,
> Kevin

View raw message