incubator-ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vivek Padmanabhan <vpadmanab...@aryaka.com>
Subject Re: Problems with installation and deploying using Ambari
Date Thu, 04 Jul 2013 14:09:54 GMT
Hi Srimanth,

  Thanks for the response my replies below. I am using HDP-1.3.0.0.

a) It was a https call made to ambari server from agent;

INFO 2013-07-04 09:38:00,378 security.py:49 - SSL Connect being called.. 
connecting to the server
INFO 2013-07-04 09:38:00,563 Controller.py:99 - Unable to connect to: 
https://xxxxx:8441/agent/v1/register/xxxxxx

If i change the jdk to 1.6, it starts working.

b) When I manually acess the url, I can properly see the status, 
gangalia, do start/stop, config changes etc.
It doesnt jump back to the installer.

c)
Was the ambari-server started on localhost initially perhaps?
     This could be. But after we corrected other machines, we did 
ambari-server reset.
     Next time it failed saying the same localhost, even though the conf 
was proper.
     Hence we removed rpm, but still did not help and finally deleted 
/etc/ambari-agent  and  /usr/lib/ambari*. Which helped.

(Everytime we were doing retry for that machine installation alone)

d) Sure will have a look at the agent logs next time.


Thanks
Vivek


On Thursday 04 July 2013 07:10 PM, Srimanth Gunturi wrote:
> Hi Vivek,
> Wanted to find out the version of Ambari you are using.
>
> a) What sort of communication failures were you seeing? If there is 
> anything specific in logs that you can share?
>
> b) UI jumping to installer after login means that the server says 
> installation is not complete. Did you notice any errors during 
> install? Also when it does go back to installer, which page of 
> installer does it end on, and are any previous values populated?
>
> When you do manually go to http://xxx:5858/#/main/dashboard - does it 
> stay there, or jump back to installer after a few clicks?
>
> c) Ambari server should be setup on a hostname (hostname -f) from 
> where agent nodes can talk back.
> Was the ambari-server started on localhost initially perhaps?
> When some agent hosts had server as localhost - did you install agent 
> manually?
>
> d) Ganglia server component failed to install for some reason. The 
> agent logs on that node should contain exceptions of why it failed. 
> Fixing that issue should help.
>
> Regards,
> Srimanth
>
>
>
>
> On Wed, Jul 3, 2013 at 10:10 PM, Vivek Padmanabhan 
> <vpadmanabhan@aryaka.com <mailto:vpadmanabhan@aryaka.com>> wrote:
>
>     Hi,
>     I was trying out ambari to setup a cluster and we faced some of
>     the below issues. Would be great if someone could throw some light
>     on these;
>
>
>
>     a) Is it possible to run ambari with jdk1.7. We are seeing some
>     communication failures while using 1.7 for ambari.
>     But prior to ambari we have tested our hadoop programs with 1.7
>     and everything went well. And all of
>     our code base is in 1.7. (we have no native apps)
>
>
>
>
>
>     b) After a cluster setup finished successfully,we are able to see
>     the dashborad etc. But after few clicks or if i am accessing
>     it from a different machine it again redirects me to the
>     installation page.
>
>     I figured out that manually entering the below urls only can help
>     us. (our port is 585. and browser cache is cleared)
>     http://xxx:5858/#/main/dashboard
>
>
>
>
>     c) During our process of hadoop deployment and installation, some
>     servers failed (ssh access) and some passed .
>     So we had to reset and start from the beginning. But this time
>     those which passed earlier are failing now,
>     since it thinks that the ambari server is 'localhost' .
>
>     The property in the /etc/...ini file the server ip was proper. So,
>     we tried the following in those failed machines
>
>     * Remove rpm,reset ambari - This did not work on retry
>     * Remove the rpm,delete /etc/ambari-agent, delete /usr/lib/ambari*
>     , retry – It worked
>
>     Does this mean that the rpm -e did not remove all the files? Is
>     there anything extra we need to care take in such scenarios
>
>
>
>
>
>     d) Hadoop installation and deployment gets successful at random
>     retries. When it fails only message we saw was ;
>     ERROR ServiceComponentHostImpl:721 – Can’t handle
>     ServiceComponentHostEvent event at current state,
>     serviceComponentName=GANGLIA_SERVER, hostName=server233.xxxxxx,
>     currentState=INSTALL_FAILED, eventType=HOST_SVCCOMP_OP_
>     SUCCEEDED, event=EventType: HOST_SVCCOMP_OP_SUCCEEDED
>     15:17:12,934 WARN HeartBeatHandler:233 – State machine exception
>     org.apache.ambari.server.state.fsm.InvalidStateTransitionException: Invalid
>     event: HOST_SVCCOMP_OP_SUCCEEDED at INSTALL_FAILED
>
>
>
>
>
>     Thanks
>     Vivek
>
>


Mime
View raw message