hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raghu Angadi <rang...@yahoo-inc.com>
Subject Re: HDFS data transfer!
Date Thu, 11 Jun 2009 18:29:52 GMT

Thanks Brian for the good advice.

Slightly off topic from original post: there will be occasions where it 
is necessary or better to copy different portions of a file in parallel 
(distcp can benefit a lot). There is a proposal to let HDFS 'stitch' 
multiple files into one: something like

NameNode.stitchFiles(Path to, Path[] files)

This way a very large file can be copied more efficiently (with a 
map/red job, for e.g). Another use case is for high latency and high 
bandwidth connections (like coast-to-coast). High latency can be some 
what worked around by using large buffers for tcp connections, but 
usually users don't have that control. It is just simpler to use 
multiple connections.

This will obviously be HDFS only interface (i.e. not a FileSystem 
method) at least initially.


Brian Bockelman wrote:
> Hey Sugandha,
> Transfer rates depend on the quality/quantity of your hardware and the 
> quality of your client disk that is generating the data.  I usually say 
> that you should expect near-hardware-bottleneck speeds for an otherwise 
> idle cluster.
> There should be no "make it fast" required (though you should reviewi 
> the logs for errors if it's going slow).  I would expect a 5GB file to 
> take around 3-5 minutes to write on our cluster, but it's a well-tuned 
> and operational cluster.
> As Todd (I think) mentioned before, we can't help any when you say "I 
> want to make it faster".  You need to provide diagnostic information - 
> logs, Ganglia plots, stack traces, something - that folks can look at.
> Brian
> On Jun 10, 2009, at 2:25 AM, Sugandha Naolekar wrote:
>> But if I want to make it fast, then??? I want to place the data in 
>> HDFS and
>> reoplicate it in fraction of seconds. Can that be possible. and How?
>> On Wed, Jun 10, 2009 at 2:47 PM, kartik saxena <kartik.sxn@gmail.com> 
>> wrote:
>>> I would suppose about 2-3 hours. It took me some 2 days to load a 160 Gb
>>> file.
>>> Secura
>>> On Wed, Jun 10, 2009 at 11:56 AM, Sugandha Naolekar
>>> <sugandha.n87@gmail.com>wrote:It
>>>> Hello!
>>>> If I try to transfer a 5GB VDI file from a remote host(not a part of
>>> hadoop
>>>> cluster) into HDFS, and get it back, how much time is it supposed to
>>> take?
>>>> No map-reduce involved. Simply Writing files in and out from HDFS 
>>>> through
>>> a
>>>> simple code of java (usage of API's).
>>>> -- 
>>>> Regards!
>>>> Sugandha
>> -- 
>> Regards!
>> Sugandha

View raw message