hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahesh Balija <balijamahesh....@gmail.com>
Subject Re: advice
Date Wed, 28 Nov 2012 06:55:55 GMT
Hi Jamal,

              Please follow the inline answers,

On Wed, Nov 28, 2012 at 10:47 AM, jamal sasha <jamalshasha@gmail.com> wrote:

> Hi,
>   Lately, I have been writing alot of algorithms in map reduce abstraction
> in python (hadoop streaming).
> I have got a hang of it (I think)...
> I have couple of questions:
> 1) By not using java libraries, what power of hadoop am I missing?
>
Though I am NOT very sure,
-> I believe there is NO better control over the job while using streaming
API.
-> Using java, in reducer phase the values get automatically aggregated
(Iterator) for a given key. But in Streaming jobs user has to take care of
aggregating/processing the values based on key
-> In normal case the framework will call map function once per each line,
but in streaming you have the better control over processing multiple lines

> 2) I know that this is just the tip of the iceberg, can someone point out
> from practical usage, what are some of the concepts I should focus on next
> ( like maybe practising combiners or hdfs??) which will improve on my
> current practical knowledge and then offcourse the not so practical part as
> well?
> Sorry for being so vague.
>
-> Its better start learning basics of HDFS, MapReduce architectures, and
then concepts like combiners, partitioner, recordreader, inputformats,
outputformats etc

Best,
Mahesh Balija,
Calsoft Labs.

Mime
View raw message