ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jeff Sposetti (JIRA)" <j...@apache.org>
Subject [jira] [Created] (AMBARI-2617) History server should be managed as separate component
Date Wed, 10 Jul 2013 17:11:48 GMT
Jeff Sposetti created AMBARI-2617:
-------------------------------------

             Summary: History server should be managed as separate component
                 Key: AMBARI-2617
                 URL: https://issues.apache.org/jira/browse/AMBARI-2617
             Project: Ambari
          Issue Type: Improvement
    Affects Versions: 1.2.4
            Reporter: Jeff Sposetti


Ambari is currently not tracking history server as a separate master component of mapreduce
service. This can make it challenging to track problems starting mapreduce w/o knowing to
go onto the host and check the history server logs.

history server should be separate component, similar to job tracker. I think it will be OK
if we make historyserver always on the same machine as jobtracker but it needs to be handled
just like jobtracker with distinct and clear start/stop operation results, and host component
start/stop controls.

Easily can see the challenge by not having historyserver separate:

1) Stop HDFS and Mapreduce
2) Only start Mapreduce
3) You'll see the start mapreduce operation fails because of the MapReduce Check execute fails
4) No indication anywhere that something failed to start (JobTracker shows started ok, which
is true)
5) Mapreduce shows green dot as started ok
6) Go to the Hosts > Host page and jobtracker is running
7) So you think everything started fine so you start thinking something might be wrong with
mapreduce configs or something...

Problem: Hosts > Host page doesn't list history server so you don't know it failed to start.
And the operations didn't show distinct history server fail to start operation so user wasn't
aware of failure.

Once you figure out that history server didn't start, then you go onto the machine and see
the historyserver process isn't running. Then you figure out how to check the logs and see
that it failed to start completely (because NN isn't up).

Note: we do have a nagios alert watching history server web ui so that does have an alert.
But that alert alone is not enough to help people troubleshoot what is wrong in their cluster
related to history server.

2013-06-06 07:43:38,930 FATAL org.apache.hadoop.mapred.JobHistoryServer: java.net.ConnectException:
Call to xx-xx-xx-xx/xx-xx-xx-xx:8020 failed on connection exception: java.net.ConnectException:
Connection refused
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1147)
at org.apache.hadoop.ipc.Client.call(Client.java:1123)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
at $Proxy5.getProtocolVersion(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message