Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6F123913A for ; Sat, 16 Jun 2012 10:37:42 +0000 (UTC) Received: (qmail 91260 invoked by uid 500); 16 Jun 2012 10:37:41 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 90996 invoked by uid 500); 16 Jun 2012 10:37:39 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 90801 invoked by uid 99); 16 Jun 2012 10:37:39 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Jun 2012 10:37:39 +0000 X-ASF-Spam-Status: No, hits=1.8 required=5.0 tests=FSL_RCVD_USER,HTML_FONT_FACE_BAD,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of swatzdev@gmail.com designates 209.85.214.176 as permitted sender) Received: from [209.85.214.176] (HELO mail-ob0-f176.google.com) (209.85.214.176) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 16 Jun 2012 10:37:34 +0000 Received: by obbwd18 with SMTP id wd18so2009883obb.35 for ; Sat, 16 Jun 2012 03:37:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=117CS0m9YFs4dHzY9x0SAlifnTkKxTYftas3YqBMt+Y=; b=dj8HhXlQpoMOxLEoXQPTcG5ox/i86Mq4PwTgAcMxd5wUJh3V9to3OfTpDvd9saBKOO u/yQfNw6udqasuOK54GpI3FUZl193X63+A6MCiwcFGcIR/u/ZZqTTPkeS2lKrCUP2gdk b3UkS7z6tmoaUJ/cvfQcR9AUwMBam8gONg05D105hLNxmxkc2S0Lfg0ZK09kEWGgr7eT fSzdb3f99Te20hssWVAOq75UdDaZkBLd268vo8nOIggWBYcSDE1xFIgkJ2coTpQ3WH4F 5OtbIf4nIGycJ0RFlU2P9MeL0zYb2o2QxiefnFqHjhIfrwXGcnFQCLj8UJ4t6+5xRLom gl6g== MIME-Version: 1.0 Received: by 10.60.19.196 with SMTP id h4mr8931434oee.56.1339843033690; Sat, 16 Jun 2012 03:37:13 -0700 (PDT) Received: by 10.76.80.100 with HTTP; Sat, 16 Jun 2012 03:37:13 -0700 (PDT) In-Reply-To: <487303183-1339839054-cardhu_decombobulator_blackberry.rim.net-437390210-@b17.c15.bise7.blackberry> References: <403191DC-12AE-4920-9AEC-18FF24685340@gmail.com> <487303183-1339839054-cardhu_decombobulator_blackberry.rim.net-437390210-@b17.c15.bise7.blackberry> Date: Sat, 16 Jun 2012 16:07:13 +0530 Message-ID: Subject: Re: Streaming in mapreduce From: swathi v To: mapreduce-user@hadoop.apache.org, bejoy.hadoop@gmail.com Content-Type: multipart/alternative; boundary=e89a8ff1c5f413f3ff04c2948426 X-Virus-Checked: Checked by ClamAV on apache.org --e89a8ff1c5f413f3ff04c2948426 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Pedro, Adding to the response of *Bejoy*, - Hadoop streaming provides the user with the ability to use arbitrary programs in other languages like ruby, python, for a job=92s map and red= uce methods. - Streaming provides the ability to use external programs as any of the job=92s mapper, combiner, or reducer. - The job is a traditional MapReduce job, with the framework handling input splitting, scheduling map tasks, scheduling input split pairs to r= un, shuffling and sorting map outputs, scheduling reduce tasks to run, and t= hen writing the reduce output to the Hadoop Distributed File System (HDFS). - The framework handles a streaming job like any other MapReduce job. - The job might specify that an executable be used as the map processor and for the reduce processor. - Each task will start an instance of the applicable executable and write an applicable representation of the input key/value pairs to the executable. - The standard output of the executable is parsed as textual key/value pairs. - The executable being run for the reduce task will given an input line for each value in the reduce value iterator, composed of the key and that value. This link explains the same: http://wiki.apache.org/hadoop/HadoopStreaming (same as the response given by Ruslan Al-Fakikh) - *EXAMPLE:*The Hadoop Core distribution provides a Jython example MapReduce application in *src/examples/python/WordCount.py* - FYI : There are libraries available for C++. The C++ interface lends itself to usage by Simplified Wrapper and Interface Generator (SWIG) to generate other language interfaces. The usage of* Hadoop Pipes* and its example goes here: http://wiki.apache.org/hadoop/C++WordCount Hope you find this useful. :) Thank You. On Sat, Jun 16, 2012 at 3:00 PM, Bejoy KS wrote: > Hi Pedro > > In simple terms Streaming API is used in hadoop if you have your mapper o= r > reducer is in any language other than java . Say ruby or python. > > Regards > Bejoy KS > > Sent from handheld, please excuse typos. > ------------------------------ > *From: * Pedro Costa > *Date: *Sat, 16 Jun 2012 10:23:20 +0100 > *To: *mapreduce-user@hadoop.apache.org > *ReplyTo: * mapreduce-user@hadoop.apache.org > *Subject: *Re: Streaming in mapreduce > > I still don't get why hadoop streaming is useful. If I have man and reduc= e > functions defined in shell script, like the one below, why should I use > Hadoop? > > cat someInputFile | shellMapper.sh | shellReducer.sh > someOutputFile > > > > On 16/06/2012, at 01:21, Ruslan Al-Fakikh wrote: > > Hi Pedro, > > You can find it here > http://wiki.apache.org/hadoop/HadoopStreaming > > Thanks > > On Sat, Jun 16, 2012 at 2:46 AM, Pedro Costa wrote: > > Hi, > > > Hadoop mapreduce can be used for streaming. But what is streaming from th= e > point of view of mapreduce? For me, streaming are video and audio data. > > > Why mapreduce supports streaming? > > > Can anyone give me an example on why to use streaming in mapreduce? > > > Thanks, > > Pedro > > --=20 - Regards, Swathi.V. , Software Developer Blog URL :http://femgeekz.blogspot.in --e89a8ff1c5f413f3ff04c2948426 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Pedro,

Adding to the response of = Bejoy,=A0
  • Hadoop= streaming provides the user with the ability to use arbitrary programs in = other languages like ruby, python, for a job=92s=A0map and reduce methods.= =A0
  • Streaming provides t= he ability to use external programs as=A0any of the job=92s mapper, combine= r, or reducer.
  • The job is a traditional MapReduce job, with=A0the framework handling= input splitting, scheduling map tasks, scheduling input split pairs to=A0r= un, shuffling and sorting map outputs, scheduling reduce tasks to run, and = then writing the=A0reduce output to the Hadoop Distributed File System (HDF= S).
  • The framework handle= s a streaming job like any other MapReduce=A0job.=A0
  • The job might specify that an e= xecutable be used as the map processor and for the reduce=A0processor.
  • Each task will start= an instance of the applicable executable and write an applicable=A0represe= ntation of the input key/value pairs to the executable.
  • The standard output of the e= xecutable is parsed as textual key/value pairs.
  • The executable being= run for the reduce task will=A0given an input line for each value in the r= educe value iterator, composed of the key and that=A0value. This link expla= ins the same:=A0=A0=A0http://wiki.apache.org/hadoop/HadoopStreaming=A0(same as the respo= nse given by=A0EXAMPLE:The H= adoop Core distribution provides a Jython example MapReduce application in = src/examples/python/WordCount.py
  • FYI : There are libraries available f= or C++. The C++ interface lends itself to usage by Simplified Wrapper and I= nterface Generator (SWIG) to generate=A0other language interfaces. The usag= e of Hadoop Pipes and its example goes here:=A0http://wiki.apache.org/hadoop/C++WordCoun= t
Hope you find this = useful. =A0:)
Thank You.

On Sat, Jun 16= , 2012 at 3:00 PM, Bejoy KS <bejoy.hadoop@gmail.com> wr= ote:
Hi Pedro

In = simple terms Streaming API is used in hadoop if you have your mapper or red= ucer is in any language other than java . Say ruby or python.

Regards
Bejoy KS

Sent from handheld, please excuse typos= .

From: Pedro Costa <psdc1978@gmail.com>
Date: Sat, 16 Jun 2012 10:23:20 +0100
To: map= reduce-user@hadoop.apache.org<mapreduce-user@hadoop.apache.org>
Subject: Re: Streaming in mapreduce

I still don't get why hadoop streaming is u= seful. If I have man and reduce functions defined in shell script, like the= one below, why should I use Hadoop?
cat someInputFi=
le | shellMapper.sh | shellReducer.sh > someOutputFile


On 16/06/2012, at 01:21, Ruslan Al-Fakikh &= lt;metaruslan@gma= il.com> wrote:

Hi Pedro,<= br>
You can find it here
http://wiki= .apache.org/hadoop/HadoopStreaming

Thanks

On Sat, Jun = 16, 2012 at 2:46 AM, Pedro Costa <psdc1978@gmail.com> wrote:
Hi,
Hadoop mapreduce can be used = for streaming. But what is streaming from the point of view of mapreduce? F= or me, streaming are video and audio data.

=A0Why mapreduce supports streaming?

Can anyone give me an example on why to use streaming in mapreduce?

Thanks,
Pedro



--
- Regards,=A0
Swathi.V. ,
= Software Developer

--e89a8ff1c5f413f3ff04c2948426--