hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4346) Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient
Date Wed, 27 Jun 2012 20:12:44 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13402522#comment-13402522
] 

Alejandro Abdelnur commented on MAPREDUCE-4346:
-----------------------------------------------

@Arun, 

I'm working with Ahmed on this one. 

The use case we have is large clusters running 1000+ concurrent jobs, monitoring agents are
querying the cluster for jobs in different statuses, most of the times this agents focus on
running/just finished jobs. With the current API we are forced to query ALL jobs, including
retired jobs (which increases significantly the number of jobs being returned), and do the
filtering in the client side. This creates unnecessary load on the JT (serializing all jobs)
and on the client (deserializing all jobs). Thus adding this new API, which does not break
backwards compatibility will definitely help reducing this load. 

Regarding the support in MRv2, we currently have a the getAllJobs() method there as well,
we can address it in the client side for sure (the fallback implementation Ahmed did in the
client for MRv1). We could add and PB call to support the filtering on the RM side. While
looking at MRv2 code I've noticed we are only querying the RM, this means that completed jobs
will never be returned by this call. If I'm correct here, a solution would be for the client
to call the HS to ask for jobs younger than X; this would be the equivalent of 'retired' jobs,
and definitely the filtering would be useful as well for the same reasons explained above.
 
                
> Adding a refined version of JobTracker.getAllJobs() and exposing through the JobClient
> --------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4346
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4346
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: mrv1
>            Reporter: Ahmed Radwan
>            Assignee: Ahmed Radwan
>         Attachments: MAPREDUCE-4346.patch, MAPREDUCE-4346_rev2.patch, MAPREDUCE-4346_rev3.patch,
MAPREDUCE-4346_rev4.patch
>
>
> The current implementation for JobTracker.getAllJobs() returns all submitted jobs in
any state, in addition to retired jobs. This list can be long and represents an unneeded overhead
especially in the case of clients only interested in jobs in specific state(s). 
> It is beneficial to include a refined version where only jobs having specific statuses
are returned and retired jobs are optional to include. 
> I'll be uploading an initial patch momentarily.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message