hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Pawar <nitinpawar...@gmail.com>
Subject Re: Streaming data access in HDFS: Design Feature
Date Wed, 05 Mar 2014 08:47:24 GMT
are you asking "why data read/write from/to hdfs blocks via mapreduce
framework  is done in streaming manner?"


On Wed, Mar 5, 2014 at 2:05 PM, Radhe Radhe <radhe.krishna.radhe@live.com>wrote:

> Hi Shashwat,
>
> This is an excerpt from Hadoop The Definitive Guide--Tom White
> Hadoop Streaming
> Hadoop provides an API to MapReduce that allows you to write your map and
> reduce
> functions in languages *other than Java*. Hadoop Streaming uses Unix
> standard streams
> as the interface between Hadoop and your program,
>
> *so you can use any language thatcan read standard input and write to
> standard output to write your MapReduceprogram*.
> Streaming is naturally suited for text processing (although, as of version
> 0.21.0, it can
> handle binary streams, too), and when used in text mode, it has a
> line-oriented view of
> data. Map input data is passed over standard input to your map function,
> which processes
> it line by line and writes lines to standard output. A map output
> key-value pair
> is written as a single tab-delimited line. Input to the reduce function is
> in the same
> format—a tab-separated key-value pair—passed over standard input. The
> reduce function
> reads lines from standard input, which the framework guarantees are sorted
> by
> key, and writes its results to standard output.
>
> I think this is not what I am asking for.
>
> Thanks.
> -RR
>
> ------------------------------
> From: dwivedishashwat@gmail.com
> Date: Wed, 5 Mar 2014 13:47:09 +0530
> Subject: Re: Streaming data access in HDFS: Design Feature
> To: user@hadoop.apache.org
> CC: radhe.krishna.radhe@live.com
>
>
> Streaming means process it as its coming to HDFS, like where in hadoop
> this hadoop streaming enable hadoop to receive data using executable of
> different types
>
> i hope you have already read this :
> http://hadoop.apache.org/docs/r0.18.1/streaming.html#Hadoop+Streaming
>
>
> *Warm Regards_**∞_*
> * Shashwat Shriparv*
>  [image: http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9]<http://www.linkedin.com/pub/shashwat-shriparv/19/214/2a9>[image:
> https://twitter.com/shriparv] <https://twitter.com/shriparv>[image:
> https://www.facebook.com/shriparv] <https://www.facebook.com/shriparv>[image:
> http://google.com/+ShashwatShriparv] <http://google.com/+ShashwatShriparv>[image:
> http://www.youtube.com/user/sShriparv/videos]<http://www.youtube.com/user/sShriparv/videos>[image:
> http://profile.yahoo.com/SWXSTW3DVSDTF2HHSRM47AV6DI/] <shriparv@yahoo.com>
>
>
>
> On Wed, Mar 5, 2014 at 1:38 PM, Radhe Radhe <radhe.krishna.radhe@live.com>wrote:
>
> Hello All,
>
> Can anyone please explain what we mean by *Streaming data access in HDFS*.
>
> Data is usually copied to HDFS and in HDFS the data is splitted across
> DataNodes in blocks.
> Say for example, I have an input file of 10240 MB(10 GB) in size and a
> block size of 64 MB. Then there will be 160 blocks.
> These blocks will be distributed across DataNodes in blocks.
> Now the Mappers will read data from these DataNodes keeping the *data
> locality feature* in mind(i.e. blocks local to a DataNode will be read by
> the map tasks running in that DataNode).
>
> Can you please point me where is the "Streaming data access in HDFS" is
> coming into picture here?
>
> Thanks,
> RR
>
>
>


-- 
Nitin Pawar

Mime
View raw message