hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sandy Ryza (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5577) Allow querying the JobHistoryServer by job arrival time
Date Wed, 16 Oct 2013 00:09:43 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13796228#comment-13796228

Sandy Ryza commented on MAPREDUCE-5577:

The goal is to make things easier for clients that are trying to track all jobs that go through
the JHS.  Without this, they must always query the largest interval that a job could conceivably
come in after its finish time (which could be minutes with things like GC pauses).  This means
a lot of redundant job data transferred and more work for the client, as it must keep track
of all the jobs it's received in that time interval to filter out what's new.

> Allow querying the JobHistoryServer by job arrival time
> -------------------------------------------------------
>                 Key: MAPREDUCE-5577
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5577
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobhistoryserver
>            Reporter: Sandy Ryza
>            Assignee: Sandy Ryza
>         Attachments: MAPREDUCE-5577.patch
> The JobHistoryServer REST APIs currently allow querying by job submit time and finish
time.  However, jobs don't necessarily arrive in order of their finish time, meaning that
a client who wants to stay on top of all completed jobs needs to query large time intervals
to make sure they're not missing anything.  Exposing functionality to allow querying by the
time a job lands at the JobHistoryServer would allow clients to set the start of their query
interval to the time of their last query. 
> The arrival time of a job would be defined as the time that it lands in the done directory
and can be picked up using the last modified date on history files.

This message was sent by Atlassian JIRA

View raw message