Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of balijamahesh.mca@gmail.com
 designates 209.85.216.48 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CACb0Fn7QyajsWfst62aj-tRiDxhUTBntPP9e+rFfcnJPh5y5YA@mail.gmail.com>
References: 
 <CACb0Fn7QyajsWfst62aj-tRiDxhUTBntPP9e+rFfcnJPh5y5YA@mail.gmail.com>
Date: Wed, 28 Nov 2012 12:25:55 +0530
Message-ID: 
 <CANiuQZf4caK75YpLNn+q6OEVOi-MvWrpswuFwXv=Td41AsPk3g@mail.gmail.com>
Subject: Re: advice
From: Mahesh Balija <balijamahesh.mca@gmail.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=047d7bdc8d26749d9204cf88a826

--047d7bdc8d26749d9204cf88a826
Content-Type: text/plain; charset=ISO-8859-1

Hi Jamal,

              Please follow the inline answers,

On Wed, Nov 28, 2012 at 10:47 AM, jamal sasha <jamalshasha@gmail.com> wrote:

> Hi,
>   Lately, I have been writing alot of algorithms in map reduce abstraction
> in python (hadoop streaming).
> I have got a hang of it (I think)...
> I have couple of questions:
> 1) By not using java libraries, what power of hadoop am I missing?
>
Though I am NOT very sure,
-> I believe there is NO better control over the job while using streaming
API.
-> Using java, in reducer phase the values get automatically aggregated
(Iterator) for a given key. But in Streaming jobs user has to take care of
aggregating/processing the values based on key
-> In normal case the framework will call map function once per each line,
but in streaming you have the better control over processing multiple lines

> 2) I know that this is just the tip of the iceberg, can someone point out
> from practical usage, what are some of the concepts I should focus on next
> ( like maybe practising combiners or hdfs??) which will improve on my
> current practical knowledge and then offcourse the not so practical part as
> well?
> Sorry for being so vague.
>
-> Its better start learning basics of HDFS, MapReduce architectures, and
then concepts like combiners, partitioner, recordreader, inputformats,
outputformats etc

Best,
Mahesh Balija,
Calsoft Labs.

--047d7bdc8d26749d9204cf88a826
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Hi Jamal,<br><br>=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 Please follow the =
inline answers,<br><br><div class=3D"gmail_quote">On Wed, Nov 28, 2012 at 1=
0:47 AM, jamal sasha <span dir=3D"ltr">&lt;<a href=3D"mailto:jamalshasha@gm=
ail.com" target=3D"_blank">jamalshasha@gmail.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Hi,<div>=A0 Lately, I have been writing alot=
 of algorithms in map reduce abstraction in python (hadoop streaming).</div=
><div>
I have got a hang of it (I think)...</div><div>I have couple of questions:<=
/div><div>1) By not using java libraries, what power of hadoop am I missing=
?</div></blockquote><div><span style=3D"color:rgb(0,0,153)">Though I am NOT=
 very sure,</span><br style=3D"color:rgb(0,0,153)">
<span style=3D"color:rgb(0,0,153)">-&gt; I believe there is NO better contr=
ol over the job while using streaming API.</span><br style=3D"color:rgb(0,0=
,153)"><span style=3D"color:rgb(0,0,153)">-&gt; Using java, in reducer phas=
e the values get automatically aggregated (Iterator) for a given key. But i=
n Streaming jobs user has to take care of aggregating/processing the values=
 based on key</span><br style=3D"color:rgb(0,0,153)">
<span style=3D"color:rgb(0,0,153)">-&gt; In normal case the framework will =
call map function once per each line, but in streaming you have the better =
control over processing multiple lines</span><br></div><blockquote class=3D=
"gmail_quote" style=3D"margin:0pt 0pt 0pt 0.8ex;border-left:1px solid rgb(2=
04,204,204);padding-left:1ex">

<div>2) I know that this is just the tip of the iceberg, can someone point =
out from practical usage, what are some of the concepts I should focus on n=
ext ( like maybe practising combiners or hdfs??) which will improve on my c=
urrent practical knowledge and then offcourse the not so practical part as =
well?</div>

<div>Sorry for being so vague.</div></blockquote><div><span style=3D"color:=
rgb(0,0,153)">-&gt; Its better start learning basics of HDFS, MapReduce arc=
hitectures, and then concepts like combiners, partitioner, recordreader, in=
putformats, outputformats etc</span><br>
<br>Best,<br>Mahesh Balija,<br>Calsoft Labs.</div></div><br>

--047d7bdc8d26749d9204cf88a826--