hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5083) Optionally a separate daemon should serve JobHistory
Date Wed, 21 Jan 2009 05:25:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665706#action_12665706
] 

Amar Kamat commented on HADOOP-5083:
------------------------------------

Here is one proposal :
- Run job-history-server as a separate mapred daemon, similar to namenode and jobtracker.
Start it after the namenode and before the jobtracker. 
- The jobtracker should be passed the server info via the conf, similar to how the namenode
info is passed to it. Say _mapred.job.history.server.address_ which should be _hostname:port_.
- The jobtracker passes this info to its web server and history link on the jobtracker web-ui
points to this server
- All the jsp code to do with the job history (analysis, loading etc) gets moved to a separate
webapp folder say _history_. Make sure that the history can no longer accessed by the jobtracker
web-ui
- Retire jobs as soon as they finish
- Make job-history-server a standalone entity which can be used without the jobtracker something
like _http://hostname:50040_, redirecting to say {{jobhistory.jsp}}. Note that this will help
in offline browsing of history even if the jobtracker is down or under maintenance. Also in
case of some server issue, this server can be restarted as like any other daemon
- Keep the heap of this server less as compared to other daemons (configurable)
- by default run this server on the jobtracker machine (i.e via hod etc). With lesser heap
size and a separate jvm, the jobtracker (process + host) will be safe from memory issues

I have purposefully skipped minute details as they can be worked out later once we agree on
the direction.

Things to ponder :
- Running job-history-server on a separate machine would work if job-history is on hdfs, what
if the job-history is on the local-fs. As of now we can leave it to the cluster admin to make
sure that the job-history-server should run with the *right* parameters i.e right machine
and also with the same parameters that will be passed to the jobtracker.
- What if the job-history-server goes down after the jobtracker goes down?. Either get the
server up on the same address (host:port) or start it someplace else and restart the jobtracker
with the new address. Is there some way to reload the jobtracker with new parameter values
without restarting.

Thoughts?


> Optionally a separate daemon should serve JobHistory
> ----------------------------------------------------
>
>                 Key: HADOOP-5083
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5083
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>
> Currently the JobTracker serves the JobHistory to end-users off files local-disk/hdfs.
While running very large clusters with a large user-base might result in lots of traffic for
job-history which needlessly taxes the JobTracker. The proposal is to have an optional daemon
which handles serving of job-history requests.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message