hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod K V (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4296) Spasm of JobClient failures on successful jobs every once in a while
Date Thu, 16 Oct 2008 11:14:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640135#action_12640135
] 

Vinod K V commented on HADOOP-4296:
-----------------------------------

-1. I think that in this patch itself you should also make JobClient skip out of the loop
when it detects the job as complete, because that is the correct fix irrespective of what
the polling interval of JobClient is. Currently note that we are not sync'ing client's polling
interval with MIN_TIME_BEFORE_RETIRE that you added, so if later, client's polling interval
becomes configurable/increases beyond MIN_TIME_BEFORE_RETIRE, this problem surfaces again.

> Spasm of JobClient failures on successful jobs every once in a while
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4296
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4296
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.1
>            Reporter: Joydeep Sen Sarma
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4296_jt_delayretire.patch, 4296_jt_delayretire2.patch
>
>
> At very busy times - we get a wave of job client failures all at the same time. the failures
come when the job is about to complete. when we look at the job history files - the jobs are
actually complete. Here's the stack:
> 08/09/27 02:18:00 INFO mapred.JobClient:  map 100% reduce 98%
> 08/09/27 02:18:41 INFO mapred.JobClient:  map 100% reduce 99% 
> java.lang.NullPointerException
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:993)
> 	at com.facebook.hive.common.columnSetLoader.main(columnSetLoader.java:535)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message