pig-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Shravan Matthur Narayanamurthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (PIG-457) pig produces errors after a job is said to be 100% done
Date Fri, 26 Sep 2008 10:31:44 GMT

     [ https://issues.apache.org/jira/browse/PIG-457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Shravan Matthur Narayanamurthy updated PIG-457:

    Status: Patch Available  (was: Open)

There are two issues that this patch tries to address:
1) Exceptions and traces even after a successful completion:
Currently, we have the same code path for both the success case & failure case for getting
& printing error messages. So this fix breaks the code path to use debug for failures
in a successful completion which are solved by retries & to use error for failures in
an unsuccessful run.

2) Shows 100% even if there are failures
This is a direct result of what hadoop does. It marks the map and reduce tasks as 100 % complete
irrespective of their success or failure. In some sense these are unrelated dimensions. Since
its better to relate these two, we need to make sure that we don't report 100% complete in
case of a failed execution. This is a hack where I check if the progress has become 100% and
postpone its display till I am sure that the job has completed successfully.

There are some other fixes to the completion percentage display logic which displays the percentage
completion. In the code as we are chasing a moving target and when we assume that the job
is in a particular state & try to do some processing based on that assumption, we might
get spurious results. One example is we get the list of running jobs and try to get the progress
for each job. While doing this, the state of this job might change from running to something
else and its not easy to construct all the possible scenarios into the code. Thus when we
try to fetch the progress of a previously running job which has changed state, we will get
spurious results. To mitigate this, we make a simple assumption that the job can't regress
and if we see such a condition, we ignore it as we know its temporary.

Another thing that has been introduced into the logic is an exponential delay scheme which
will be useful when we are in a job which is not progressing may be due to bag spilling or
some udf running. In this case each progress reported is the same for some time. During this
time, we can either implement something where we hard limit saying if we don't see progress
we don't report it or we can just report the same progress. There are cons with both approaches:
for 1) it might seem like the job is stuck or there is processing happening if we don't display
anything. for 2)its surely going to fill the screen with something that is not adding any
more information. So we try to introduce delays between each batch of same progress display
which increase exponentially with each batch completing. Currently the batch size is half
the number of retries which is 6 since sleep time is 5 sec now; like trying to have a progress
reported every 30 sec but delaying future displays of the same progress using an exponential
delay scheme.

> pig produces errors after a job is said to be 100% done
> -------------------------------------------------------
>                 Key: PIG-457
>                 URL: https://issues.apache.org/jira/browse/PIG-457
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: types_branch
>            Reporter: Olga Natkovich
>            Assignee: Shravan Matthur Narayanamurthy
>             Fix For: types_branch
>         Attachments: 457-2.patch
> It is possible that we get errors for all tasks even the ones we retried. Need to look
at the code that handles detecting end of processing and producing errors.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message