hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Pawar <nitinpawar...@gmail.com>
Subject Re: Hadoop noob question
Date Sat, 11 May 2013 11:24:02 GMT
is it safe? .. there is no direct answer yes or no

when you say , you have files worth 10TB files and you want to upload  to
HDFS, several factors come into picture

1) Is the machine in the same network as your hadoop cluster?
2) If there guarantee that network will not go down?

and Most importantly I assume that you have a capable hadoop cluster. By
that I mean you have a capable namenode.

I would definitely not write files sequentially in HDFS. I would prefer to
write files in parallel to hdfs to utilize the DFS write features to speed
up the process.
you can hdfs put command in parallel manner and in my experience it has not
failed when we write a lot of data.

On Sat, May 11, 2013 at 4:38 PM, maisnam ns <maisnam.ns@gmail.com> wrote:

> @Nitin Pawar , thanks for clearing my doubts .
> But I have one more question , say I have 10 TB data in the pipeline .
> Is it perfectly OK to use hadopo fs put command to upload these files of
> size 10 TB and is there any limit to the file size  using hadoop command
> line . Can hadoop put command line work with huge data.
> Thanks in advance
> On Sat, May 11, 2013 at 4:24 PM, Nitin Pawar <nitinpawar432@gmail.com>wrote:
>> first of all .. most of the companies do not get 100 PB of data in one
>> go. Its an accumulating process and most of the companies do have a data
>> pipeline in place where the data is written to hdfs on a frequency basis
>> and  then its retained on hdfs for some duration as per needed and from
>> there its sent to archivers or deleted.
>> For data management products, you can look at falcon which is open
>> sourced by inmobi along with hortonworks.
>> In any case, if you want to write files to hdfs there are few options
>> available to you
>> 1) Write your dfs client which writes to dfs
>> 2) use hdfs proxy
>> 3) there is webhdfs
>> 4) command line hdfs
>> 5) data collection tools come with support to write to hdfs like flume etc
>> On Sat, May 11, 2013 at 4:19 PM, Thoihen Maibam <thoihen123@gmail.com>wrote:
>>> Hi All,
>>> Can anyone help me know how does companies like Facebook ,Yahoo etc
>>> upload bulk files say to the tune of 100 petabytes to Hadoop HDFS cluster
>>> for processing
>>> and after processing how they download those files from HDFS to local
>>> file system.
>>> I don't think they might be using the command line hadoop fs put to
>>> upload files as it would take too long or do they divide say 10 parts each
>>> 10 petabytes and  compress and use the command line hadoop fs put
>>> Or if they use any tool to upload huge files.
>>> Please help me .
>>> Thanks
>>> thoihen
>> --
>> Nitin Pawar

Nitin Pawar

View raw message