hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Pawar <nitinpawar...@gmail.com>
Subject Re: Hadoop noob question
Date Sat, 11 May 2013 10:54:32 GMT
first of all .. most of the companies do not get 100 PB of data in one go.
Its an accumulating process and most of the companies do have a data
pipeline in place where the data is written to hdfs on a frequency basis
and  then its retained on hdfs for some duration as per needed and from
there its sent to archivers or deleted.

For data management products, you can look at falcon which is open sourced
by inmobi along with hortonworks.

In any case, if you want to write files to hdfs there are few options
available to you
1) Write your dfs client which writes to dfs
2) use hdfs proxy
3) there is webhdfs
4) command line hdfs
5) data collection tools come with support to write to hdfs like flume etc

On Sat, May 11, 2013 at 4:19 PM, Thoihen Maibam <thoihen123@gmail.com>wrote:

> Hi All,
> Can anyone help me know how does companies like Facebook ,Yahoo etc upload
> bulk files say to the tune of 100 petabytes to Hadoop HDFS cluster for
> processing
> and after processing how they download those files from HDFS to local file
> system.
> I don't think they might be using the command line hadoop fs put to upload
> files as it would take too long or do they divide say 10 parts each 10
> petabytes and  compress and use the command line hadoop fs put
> Or if they use any tool to upload huge files.
> Please help me .
> Thanks
> thoihen

Nitin Pawar

View raw message