hadoop-pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashutosh Chauhan (JIRA)" <j...@apache.org>
Subject [jira] Commented: (PIG-948) [Usability] Relating pig script with MR jobs
Date Sun, 27 Sep 2009 22:40:16 GMT

    [ https://issues.apache.org/jira/browse/PIG-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12760121#action_12760121
] 

Ashutosh Chauhan commented on PIG-948:
--------------------------------------

@Daniel

bq. Also I notice in many cases we cannot get first job id correctly (job id is null in this
case). If I change sleepTime (MapReduceLauncher.java:100) from 500 to 1000 (ms), things look
fine. Does anyone else also see that? 

Reason for that is JobControlCompiler compiles a set of inter-dependent MR jobs and generates
a job-control object which is then submitted  asynchronously to hadoop for execution. Since
we dont block on those thread,  its possible that job-ids are not yet assigned when we ask
for them. Setting sleep time to higher value like 1000ms should be sufficient for most cases
and should work. Note increasing this sleep time doesn't affect execution in anyway since
we are sleeping in a thread which only does reporting. Another fool-proof though complicated
approach is to sleep for shorter time duration, then check if id is assigned, if not sleep
again in a while loop until ids are assigned.

> [Usability] Relating pig script with MR jobs
> --------------------------------------------
>
>                 Key: PIG-948
>                 URL: https://issues.apache.org/jira/browse/PIG-948
>             Project: Pig
>          Issue Type: Improvement
>          Components: impl
>    Affects Versions: 0.4.0
>            Reporter: Ashutosh Chauhan
>            Assignee: Ashutosh Chauhan
>            Priority: Minor
>             Fix For: 0.6.0
>
>         Attachments: pig-948-2.patch, pig-948.patch
>
>
> Currently its hard to find a way to relate pig script with specific MR job. In a loaded
cluster with multiple simultaneous job submissions, its not easy to figure out which specific
MR jobs were launched for a given pig script. If Pig can provide this info, it will be useful
to debug and monitor the jobs resulting from a pig script.
> At the very least, Pig should be able to provide user the following information
> 1) Job id of the launched job.
> 2) Complete web url of jobtracker running this job. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message