hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sharad Agarwal (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-864) Enhance JobClient API implementations to look at history files to get information about jobs that are not in memory
Date Fri, 04 Sep 2009 05:32:57 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12751311#action_12751311
] 

Sharad Agarwal commented on MAPREDUCE-864:
------------------------------------------

bq. Job client will read the history file from HDFS location and construct the job information
using JobHistory parser (provided as part of MAPREDUCE-157). While calling api's like Job#getCounters,
Job clients would be transparent to the fact that information is being served from Jobtracker
or from parsed history data.
I realized that this may have issues since there are calls (such as getTaskCompletionEvents,
getTaskReports etc.) which need task level information as well. One option here is to load
all history data in Job, but that leads to loading of huge datastructures in memory and many
clients may not at all be interested in drilling down this task level stuff. Also reading
from history transparently may confuse clients as the cost and performance impact of the same
call will change drastically depending on the source of information. 
I am inclining to NOT have org.apache.hadoop.mapreduce.Job serve data from history. Let clients
directly use JobHistory parser API to construct the info they need when job is knocked out
of job tracker's memory. Thoughts ?



> Enhance JobClient API implementations to look at history files to get information about
jobs that are not in memory
> -------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-864
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-864
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: jobtracker
>            Reporter: Devaraj Das
>            Assignee: Sharad Agarwal
>             Fix For: 0.21.0
>
>
> MAPREDUCE-817 added an API to get the JobHistory URL from the JobTracker. This is useful
in two ways:
> 1) Users can use this API to get the URL, copy the history files to their local disk,
and, do processing on them
> 2) APIs like JobSubmissionProtocol.getJobCounters, can read a part of the history file,
and then return the information to the caller (if the job is not there in JT memory). This
would  mimic most of the CompletedJobsStatusStore functionality.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message