hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4296) Spasm of JobClient failures on successful jobs every once in a while
Date Mon, 06 Oct 2008 06:47:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12637015#action_12637015
] 

Joydeep Sen Sarma commented on HADOOP-4296:
-------------------------------------------

we definitely care about the status of completed jobs (and i think most installations would
- given that at least some of the uses are always programmatic invocations that check return
status).

does the jobstatus store need to scan dfs even when the job status is available in memory?
(falling back to persistent store only when the data is missing in memory would seem like
a good strategy). another question is whether job counters are available from the persisted
job status?

> Spasm of JobClient failures on successful jobs every once in a while
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4296
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4296
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.1
>            Reporter: Joydeep Sen Sarma
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: 4296_jt_delayretire.patch
>
>
> At very busy times - we get a wave of job client failures all at the same time. the failures
come when the job is about to complete. when we look at the job history files - the jobs are
actually complete. Here's the stack:
> 08/09/27 02:18:00 INFO mapred.JobClient:  map 100% reduce 98%
> 08/09/27 02:18:41 INFO mapred.JobClient:  map 100% reduce 99% 
> java.lang.NullPointerException
> 	at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:993)
> 	at com.facebook.hive.common.columnSetLoader.main(columnSetLoader.java:535)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.util.RunJar.main(RunJar.java:155)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message