hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4296) Spasm of JobClient failures on successful jobs every once in a while
Date Fri, 17 Oct 2008 19:15:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12640632#action_12640632
] 

dhruba borthakur commented on HADOOP-4296:
------------------------------------------

Hi Vinod, thanks for the review.

> You should check for the condition before the job is even removed from JT.
I do not understand this. Can you pl explain?

>Further, are you sure we want to throw an exception in the job-client? 
No exceptions are being thrown in the JobClient. The pre-existing code catches those exceptions
and retries (I have not changed any of these).

I am pretty sure that this is not a "feature". It fixes a bug.




> Spasm of JobClient failures on successful jobs every once in a while
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4296
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4296
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.1
>            Reporter: Joydeep Sen Sarma
>            Assignee: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.19.0
>
>         Attachments: 4296_jt_delayretire.patch, 4296_jt_delayretire2.patch, 4296_jt_delayretire3.patch
>
>
> At very busy times - we get a wave of job client failures all at the same time. the failures
come when the job is about to complete. when we look at the job history files - the jobs are
actually complete. Here's the stack:
> 08/09/27 02:18:00 INFO mapred.JobClient:  map 100% reduce 98%
> 08/09/27 02:18:41 INFO mapred.JobClient:  map 100% reduce 99% 
> java.lang.NullPointerException
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:993)
> 	at com.facebook.hive.common.columnSetLoader.main(columnSetLoader.java:535)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message