hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Zoltán Tóth-Czifra <zoltan.tothczi...@softonic.com>
Subject Complex MapReduce applications with the streaming API
Date Tue, 27 Nov 2012 12:03:41 GMT
Hi everyone,

Thanks in advance for the support. My problem is the following:

I'm trying to develop a fairly complex MapReduce application using the streaming API (for
demonstation purposes, so unfortunately the "use Java" answer doesn't work :-( ). I can get
one single MapReduce phase running from command line with no problem. The problem is when
I want to add more MapReduce phases which use each others output, and I maybe even want to
do a recursion (feed the its output to the same phase again) conditioned by a counter.

The solution in Java MapReduce is trivial (i.e. creating multiple Job instances and monitoring
counters) but with the streaming API not quite. What is the correct way to manage my application
with its native code? (Python, PHP, Perl...) Calling shell commands from a "controller" script?
How should I obtain counters?...

Using Oozie seems to be an overkilling for this application, besides, it doesn't support "loops"
so the recusrsion can't really be implemented.

Thanks a lot!

View raw message