hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Seigel" <ja...@tynt.com>
Subject Re: copying file into hdfs
Date Sat, 10 Apr 2010 22:39:41 GMT
Maybe copy your hdfs config here and we can see why it took up 16 gigs  
of space.

Cheers

Sent from my mobile. Please excuse the typos.

On 2010-04-10, at 3:22 PM, "Michael Segel" <michael_segel@hotmail.com>  
wrote:

>
>
> Mike,
>
> First, you need to see what you set your block size to in Hadoop. By  
> default its 64MB. With large files, you may want to bump that up to  
> 128 MB per block.
> 2GB file will give you roughly 20 m/r jobs.
>
> I'd use hadoop fs -copyFromLocal <local file name> <hdfs file name>.
>
> (Ok, I'm going from memory on the hadoop command, but you can always  
> do a hadoop help to see the command.)
>
> Also you need to see what you set for your replication factor.  
> Usually its 3.
>
> The your 2GB file will be roughly 6GB in size and should be balanced  
> on all of the nodes with 2 or 3 blocks per machine.
>
> HTH
>
> -Mike
>
>> Date: Sat, 10 Apr 2010 14:03:02 -0400
>> Subject: copying file into hdfs
>> From: nullecksor@gmail.com
>> To: common-user@hadoop.apache.org
>>
>> Hi,
>>
>> Im mike,
>> I am a new user of Hadoop. currently, I have a cluster of 8  
>> machines and a
>> file of size 2 gigs.
>> When I load it into hdfs using command
>> hadoop dfs -put /a.dat /data
>> It actually loads it on all data nodes. dfsadmin -report shows hdfs  
>> usage to
>> 16 gigs. And it is taking 2 hours to load that data file.
>>
>> with 1 node - my mapreduce operation on this data took 150 seconds.
>>
>> So when I used my mapred operation on this cluster.. it is taking 220
>> seconds for same file.
>>
>> Can some one please tell me How to distribute this file over 8  
>> nodes - so
>> that each of them will have roughly 300 mbs of file chunk and the  
>> mapreduce
>> operation that I have wrote to work in parallel? Isn't hadoop cluster
>> supposed to be working in parallel?
>>
>> best.
>
> _________________________________________________________________
> The New Busy think 9 to 5 is a cute idea. Combine multiple calendars  
> with Hotmail.
> http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5

Mime
View raw message