Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 38A83ECA7 for ; Tue, 27 Nov 2012 17:11:53 +0000 (UTC) Received: (qmail 1955 invoked by uid 500); 27 Nov 2012 17:11:48 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 1604 invoked by uid 500); 27 Nov 2012 17:11:48 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 1579 invoked by uid 99); 27 Nov 2012 17:11:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Nov 2012 17:11:47 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of tucu@cloudera.com designates 209.85.220.176 as permitted sender) Received: from [209.85.220.176] (HELO mail-vc0-f176.google.com) (209.85.220.176) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 27 Nov 2012 17:11:41 +0000 Received: by mail-vc0-f176.google.com with SMTP id fl13so15423206vcb.35 for ; Tue, 27 Nov 2012 09:11:20 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:x-gm-message-state; bh=DysvDbUPD55RbU3rwLa3dcK1zsaummIGK7yZ/S3fPY8=; b=QL6D4cmXtBeKEvjuGHSrImmxdRHsPR1IsXqiCk4PqDDguZ78qxVCOmbmixkhzOyiPv W72SuUjbVVh9hioeMhDq4nU9HMEQdRrKbANbrh3RH2olZSlMx8OWynPq4B6ZDHg4uNOb RkAmKWIlpyxInNgHJb3PjVpKTJzaNFmExHpcJrhoN6pEd1bD/zc44KQp+bqTA/53cxdk he3WgAGA94P2Sy2KfYnNet1T2AtFHzUCA5AedlnUhkYkN7azOBXGFvjSvh/QyGiQ+0E8 7ASoijRIQDypKlweij6YKyksbvnHGLdNyWtTCtEHbcr87Uat6fK7+XBx7BJAkpC/fODI Pz3w== Received: by 10.220.156.10 with SMTP id u10mr14052428vcw.28.1354036280052; Tue, 27 Nov 2012 09:11:20 -0800 (PST) MIME-Version: 1.0 Received: by 10.58.218.194 with HTTP; Tue, 27 Nov 2012 09:10:48 -0800 (PST) In-Reply-To: <0005DA68C31EA7428873518FF86A94368E7E85@MAILSVR01.domino.softonic.com> References: <0005DA68C31EA7428873518FF86A94368E7E85@MAILSVR01.domino.softonic.com> From: Alejandro Abdelnur Date: Tue, 27 Nov 2012 09:10:48 -0800 Message-ID: Subject: Re: Complex MapReduce applications with the streaming API To: "common-user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=f46d043890937c4c3e04cf7d23fd X-Gm-Message-State: ALoCoQlOukjVvB+78q3Cir+zmxO6ioRcjfKqbD5h3uIpgdq1kiTpFYajNdY0e9fw4cqHFGFN/XVa X-Virus-Checked: Checked by ClamAV on apache.org --f46d043890937c4c3e04cf7d23fd Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable > Using Oozie seems to be an overkilling for this application, besides, it doesn't support "loops" > so the recusrsion can't really be implemented. Correct, Oozie does not support loops, this is a restriction by design (early prototypes supported loops). The idea was that you didn't want never ending workflows. To this end, Coordinator Jobs address the recurrent run of workflow jobs. Still, if you want to do recursion in Oozie, you certainly can, a workflow invoking to itself as a sub-workflow. Just make sure you define properly your exit condition. If you have additional questions, please move this thread to the user@oozie.apache.org alias. Thx On Tue, Nov 27, 2012 at 4:03 AM, Zolt=E1n T=F3th-Czifra < zoltan.tothczifra@softonic.com> wrote: > Hi everyone, > > Thanks in advance for the support. My problem is the following: > > I'm trying to develop a fairly complex MapReduce application using the > streaming API (for demonstation purposes, so unfortunately the "use Java" > answer doesn't work :-( ). I can get one single MapReduce phase running > from command line with no problem. The problem is when I want to add more > MapReduce phases which use each others output, and I maybe even want to d= o > a recursion (feed the its output to the same phase again) conditioned by = a > counter. > > The solution in Java MapReduce is trivial (i.e. creating multiple Job > instances and monitoring counters) but with the streaming API not quite. > What is the correct way to manage my application with its native code? > (Python, PHP, Perl...) Calling shell commands from a "controller" script? > How should I obtain counters?... > > Using Oozie seems to be an overkilling for this application, besides, it > doesn't support "loops" so the recusrsion can't really be implemented. > > Thanks a lot! > Zoltan > --=20 Alejandro --f46d043890937c4c3e04cf7d23fd Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable >=A0Using Oozie seems = to be an overkilling for this application, besides, it doesn't support = "loops"=A0
> so the recusrsion can't really be implemented.

Correct, Oozie does not= support loops, this is a restriction by design (early prototypes supported= loops). The idea was that you didn't want never ending workflows. To t= his end, Coordinator Jobs address the recurrent run of workflow jobs.

Still, if you want to do recursion in Oozie, you ce= rtainly can, a workflow invoking to itself as a sub-workflow. Just make sur= e you define properly your exit condition.

If y= ou have additional questions, please move this thread to the user@oozie.apache.org alias.


Thx


On Tue, Nov 27, 2012 at 4:03 AM, Zolt= =E1n T=F3th-Czifra <zoltan.tothczifra@softonic.com> wrote:
Hi everyone,

Thanks in advance for the support. My problem is the following:

I'm trying to develop a fairly complex MapReduce application using= the streaming API (for demonstation purposes, so unfortunately the "u= se Java" answer doesn't work :-( ). I can get one single MapReduce= phase running from command line with no problem. The problem is when I want to add more MapReduce phases which use each oth= ers output, and I maybe even want to do a recursion (feed the its output to= the same phase again) conditioned by a counter.

The solution in Java MapReduce is trivial (i.e. creating multiple=A0Jo= b instances and monitoring counters) but with the streaming API not quite. = What is the correct way to manage my application with its native code? (Pyt= hon, PHP, Perl...) Calling shell commands from a "controller" script? How should I obtain counters?...

Using Oozie seems to be an overkilling for this application, besides, = it doesn't support "loops" so the recusrsion can't really= be implemented.

Thanks a lot!
Zoltan



--
Alejandro
--f46d043890937c4c3e04cf7d23fd--