crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-272) Unable to correlate crunch jobs within Oozie
Date Tue, 27 May 2014 19:41:01 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14010181#comment-14010181
] 

Micah Whitacre commented on CRUNCH-272:
---------------------------------------

So unfortunately my approach is not a complete solution.  Specifically I missed this line[1]
of code that is embedded inside of the launcher action that actually ties the properties back
into the action and subsequently had the values stored in the Oozie.  This means that we will
need a custom Oozie launching action/code which isn't horrible but I'm not sure we have a
set structure to be able to create a schema for launching Crunch pipelines.

[1] - https://github.com/cloudera/oozie/blob/a659fd0f2e56850a35e38a6174667b0c07a75b57/core/src/main/java/org/apache/oozie/action/hadoop/HiveActionExecutor.java#L123

> Unable to correlate crunch jobs within Oozie
> --------------------------------------------
>
>                 Key: CRUNCH-272
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-272
>             Project: Crunch
>          Issue Type: Improvement
>            Reporter: Mike Zimmerman
>            Assignee: Micah Whitacre
>         Attachments: CRUNCH-272_prototype.patch
>
>
> I'm not really sure if this should be logged to Oozie or to Crunch, so please feel free
to move as needed.
> I would like to request a way to decorate map/reduce jobs that are spawned by a Crunch
pipeline so that I can programmatically determine their origin.  The primary use case for
this is integration with Oozie.  Oozie launches a single map job to run a java action (in
our case this java action runs a crunch job).  Traceability from this original "launcher"
job to the jobs created by the crunch job is impossible without trolling logs.  This leaves
a big black hole for the system operator to assess the performance/impact of these jobs. 
My initial thought was to provide a simple way to indicate a correlationId or similar on a
map/reduce job and then make it accessible within Oozie to query for.  Obviously, that request
would have to come after the correlation feature was available within map/reduce.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message