hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ognen Duzlevski <og...@nengoiksvelzud.com>
Subject Re: HDFS question
Date Tue, 28 Jan 2014 17:04:18 GMT
There is a lesson in this by the way, I just realized I pasted my
access/secret access key to the bucket in the public email. DOH, changed ;)

Ognen


On Tue, Jan 28, 2014 at 10:55 AM, Ognen Duzlevski
<ognen@nengoiksvelzud.com>wrote:

> Ahh. No, I do not have a job tracker. OK - I guess I need to set one up :)
>
> Thanks!
> Ognen
>
>
> On Tue, Jan 28, 2014 at 10:51 AM, Bryan Beaudreault <
> bbeaudreault@hubspot.com> wrote:
>
>> Do you have a jobtracker?  Without a jobtracker and tasktrackers, distcp
>> is running in LocalRunner mode.  I.E. it is running a single-threaded
>> process on the local machine.  The default behavior of the DFSClient is to
>> write data locally first, with replicas being placed off-rack then on-rack.
>>
>> This would explain why everything seems to be going locally, it is also
>> probably much slower than it could be.
>>
>>
>> On Tue, Jan 28, 2014 at 11:42 AM, Ognen Duzlevski <
>> ognen@nengoiksvelzud.com> wrote:
>>
>>> Hello,
>>>
>>> I am new to Hadoop and HDFS so maybe I am not understanding things
>>> properly but I have the following issue:
>>>
>>> I have set up a name node and a bunch of data nodes for HDFS. Each node
>>> contributes 1.6TB of space so the total space shown on the hdfs web front
>>> end is about 25TB. I have set the replication to be 3.
>>>
>>> I am downloading large files on a single data node from Amazon's S3
>>> using the -distcp command - like this:
>>>
>>>  hadoop --config /etc/hadoop distcp
>>> s3n://AKIAIUHOFVALO67O6FJQ:DV86+JnmNiMGZH9VpdtaZZ8ZJQKyDxy6yKtDBLPp@data-pipeline/large_data/2013-12-02.json
>>> hdfs://10.10.0.198:54310/test/2013-12-03.json
>>>
>>> Where 10.10.0.198 is the Hadoop Name node.
>>>
>>> All I am getting is that the machine I am running these commands on (one
>>> of the data nodes) is getting all the files - they do not seem to be
>>> "spreading" around the HDFS cluster.
>>>
>>> Is this expected? Did I completely misunderstand the point of a parallel
>>> DISTRIBUTED file system? :)
>>>
>>> Thanks!
>>> Ognen
>>>
>>
>>
>

Mime
View raw message