incubator-ambari-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sumit Mohanty <smoha...@hortonworks.com>
Subject Re: Problems with installation and deploying using Ambari
Date Sun, 07 Jul 2013 14:57:25 GMT
Regarding (c), removing postgres does not remove the database files. You
could remove the files manually or call ambari-server reset to re-create the
database. 
ambari-server must not be running when calling reset.

Thanks
Sumit

From:  Vivek Padmanabhan <vpadmanabhan@aryaka.com>
Reply-To:  <ambari-user@incubator.apache.org>
Date:  Sunday, July 7, 2013 6:18 AM
To:  <ambari-user@incubator.apache.org>
Subject:  Re: Problems with installation and deploying using Ambari

    
 
Thanks Srimanth,
 
 a) Yes got this point. We followed this way and it was able to register the
agents without problems, ie, the Ambari Server still uses jdk1.6 but all the
services will be invoked from
 the custom jdk.
 
 
 b) No , this has happened only once for us.
 
 
 c) Another issue we found is that , even if I remove ambari. (yum remove
ambari\*) this is not removing the postgres.
 We were starting freshly the ambari and cluster setup, but some of the
processes were not starting. From the logs
 it was surprising to find the old cluster info;
    
 
 INFO 2013-07-07 18:27:29,142 ActionQueue.py:82 - Adding STATUS_COMMAND for
service GANGLIA of cluster NewDevCluster to the queue.
 INFO 2013-07-07 18:27:29,150 ActionQueue.py:82 - Adding STATUS_COMMAND for
service HBASE of cluster NewDevCluster to the queue.
 INFO 2013-07-07 18:27:29,159 ActionQueue.py:82 - Adding STATUS_COMMAND for
service HBASE of cluster NewDevCluster to the queue.
 INFO 2013-07-07 18:27:29,167 ActionQueue.py:82 - Adding STATUS_COMMAND for
service HDFS of cluster NewDevCluster to the queue.
 INFO 2013-07-07 18:27:29,176 ActionQueue.py:82 - Adding STATUS_COMMAND for
service HDFS of cluster NewDevCluster to the queue.
 INFO 2013-07-07 18:27:29,184 ActionQueue.py:82 - Adding STATUS_COMMAND for
service HDFS of cluster NewDevCluster to the queue.
 INFO 2013-07-07 18:27:29,193 ActionQueue.py:82 - Adding STATUS_COMMAND for
service MAPREDUCE of cluster NewDevCluster to the queue.
 INFO 2013-07-07 18:27:29,201 ActionQueue.py:82 - Adding STATUS_COMMAND for
service MAPREDUCE of cluster NewDevCluster to the queue.
 INFO 2013-07-07 18:27:29,210 ActionQueue.py:82 - Adding STATUS_COMMAND for
service MAPREDUCE of cluster NewDevCluster to the queue.
 INFO 2013-07-07 18:27:29,218 ActionQueue.py:82 - Adding STATUS_COMMAND for
service PIG of cluster NewDevCluster to the queue.
 INFO 2013-07-07 18:27:29,227 ActionQueue.py:82 - Adding STATUS_COMMAND for
service ZOOKEEPER of cluster NewDevCluster to the queue.
 INFO 2013-07-07 18:27:29,235 ActionQueue.py:82 - Adding STATUS_COMMAND for
service ZOOKEEPER of cluster NewDevCluster to the queue.
 INFO 2013-07-07 18:27:29,244 ActionQueue.py:82 - Adding STATUS_COMMAND for
service HDFS of cluster DevCluster1 to the queue.
 INFO 2013-07-07 18:27:29,253 ActionQueue.py:82 - Adding STATUS_COMMAND for
service MAPREDUCE of cluster DevCluster1 to the queue.
 INFO 2013-07-07 18:27:29,261 ActionQueue.py:82 - Adding STATUS_COMMAND for
service ZOOKEEPER of cluster DevCluster1 to the queue.
 INFO 2013-07-07 18:27:29,270 ActionQueue.py:82 - Adding STATUS_COMMAND for
service HDFS of cluster DevCluster1 to the queue.
 INFO 2013-07-07 18:27:29,278 ActionQueue.py:82 - Adding STATUS_COMMAND for
service MAPREDUCE of cluster DevCluster1 to the queue.
 INFO 2013-07-07 18:27:29,287 ActionQueue.py:82 - Adding STATUS_COMMAND for
service HBASE of cluster DevCluster1 to the queue.
 INFO 2013-07-07 18:27:29,295 ActionQueue.py:82 - Adding STATUS_COMMAND for
service HDFS of cluster DevCluster1 to the queue.
 INFO 2013-07-07 18:27:29,304 ActionQueue.py:82 - Adding STATUS_COMMAND for
service MAPREDUCE of cluster DevCluster1 to the queue.
 INFO 2013-07-07 18:27:29,312 ActionQueue.py:82 - Adding STATUS_COMMAND for
service HBASE of cluster DevCluster1 to the queue.
 INFO 2013-07-07 18:27:29,321 ActionQueue.py:82 - Adding STATUS_COMMAND for
service PIG of cluster DevCluster1 to the queue.
 INFO 2013-07-07 18:27:29,329 ActionQueue.py:82 - Adding STATUS_COMMAND for
service ZOOKEEPER of cluster DevCluster1 to the queue.
 INFO 2013-07-07 18:27:29,337 ActionQueue.py:82 - Adding STATUS_COMMAND for
service GANGLIA of cluster DevCluster1 to the queue.
 
 (DevCluster1 is our new cluster and NewDevCluster is old)
 
 This means that , even after removing ambari, the are still other process
which are left out which makes it harder to fully reset a cluster setup.
 Maybe we should have done ambari-reset, but it would be nice if ambari can
take care of the db that it installs by itself.
 Or maybe we can add this point to the document somewhere ?
 
 
 Thanks
 Vivek  
 
 
 
 On Friday 05 July 2013 09:48 PM, Srimanth Gunturi wrote:
 
 
>   
> Hi Vivek,
>  a) When using custom JDKs, the process is to make the JDK path (say
> /usr/lib/java-1.7.0) available on all the hosts where Ambari will run (server
> and agents). Then run ambari-server setup selecting the custom JDK path - this
> will update the ambari.properties with the right path. Start ambari-server,
> and during host registration page, provide the same path again in 'Path to
> 64-bit JDK JAVA_HOME'. Finish setup.
>  Please try these steps out. If jdk paths exist on all hosts, various services
> do start with that VM, and you still see the certificate issue, please open
> JIRAs regarding these.
>  
> 
>  
>  
> b) Does the installer redirect happen frequently for you after installs?
>  
> 
>  
>  
> Stopping ambari-server does not stop the agents.
>  
> Regards,
>  
> Srimanth
>  
> 
>  
>  
> 
>  
>  
>  
> 
>  
>  
>  On Thu, Jul 4, 2013 at 10:44 PM, Vivek Padmanabhan <vpadmanabhan@aryaka.com>
> wrote:
>  
>>  
>>  
>> Hi Srimanth,
>>    Thanks again for your valuable inputs.
>>  
>>  a) We change the jre before ambari server setup.
>>  
>>      The below are the steps we followed .
>>          1. Installed ambari
>>          2. In the ambari.properties changed, the port and the jdk to 1.7
>>          3. Ran ambari setup (did not install jdk1.6)
>>          4. Started ambari. And chose Auto Install of agents. Which failed
>> saying; 
>> 
>>          INFO 2013-07-04 09:38:00,378 security.py:49 - SSL Connect being
>> called.. connecting to the server
>>          INFO 2013-07-04 09:38:00,563 Controller.py:99 - Unable to connect
>> to: https://xxxxx:8441/agent/v1/register/xxxxxx
>>  
>>  
>>          5.Stopped the ambari-server, changed jdk, did setup again.
>>          6. This time it installed jdk 1.6, and agents were successfully
>> registered.
>>  
>>  b) Yes, with this url, everything was functional as normal. I was able to
>> see all status , perform conf change,start-stop etc. I guess it is a problem
>> with the url redirect.
>>  Extremely sorry abt this, but we have removed our old setup and starting
>> fresh. We need to hit the production soon, so we are experimenting
>> rigorously.
>>  
>>  
>>  c) As I mentioned in my first mail, the .ini was proper after the initial
>> trial. We removed the rpm, and did retry, but still no luck. It seemed like
>> the first value was
>>  cached somewhere.
>>       * We added some logs in the main.py expecting to see it in the agent
>> logs.
>>  
>>  
>>  Another point that we have noticed is that, stopping ambari server doesn't
>> stop the agents.Not sure whether this is the usual behavior or some problem
>> with our setup.
>>  The jdk issue is our major concern currently as we have worked around for
>> others. I will look more into the keystore point.
>>  
>>  
>>  Thanks
>>  Vivek 
>>  
>> 
>>  
>>  
>>  On Thursday 04 July 2013 09:26 PM, Srimanth Gunturi wrote:
>>  
>>  
>>  
>>  
>>  
>>  
>>>  
>>> Hi Vivek, 
>>> 
>>>  
>>>  
>>> a) Did you change the JRE before of after setup+install? If you changed it
>>> after install, there might ssh keys not in the keystore of the new VM.
>>>  
>>> 
>>>  
>>>  
>>> b)  When it does go back to installer, which page does it goto, and does it
>>> have values pre-populated?
>>>  
>>> Also, can you please go to
>>> http://ambari:port/api/v1/persist/CLUSTER_CURRENT_STATUS and provide the
>>> value for "clusterState" key.
>>>  
>>> 
>>>  
>>>  
>>> c) Ambari agents have /etc/ambari-agent/conf/ambari-agent.ini  which points
>>> to server hostname/port. This might have been initialized with 'localhost'
>>> resulting in failure, till it was fixed by removal.
>>>  
>>> 
>>>  
>>>  
>>> Hope that helped.
>>>  
>>> Regards,
>>>  
>>> Srimanth
>>>  
>>> 
>>>  
>>>  
>>> 
>>>  
>>>  
>>> 
>>>  
>>>  
>>> 
>>>  
>>>  
>>>  
>>> 
>>>  
>>>  
>>> On Thu, Jul 4, 2013 at 7:09 AM, Vivek Padmanabhan <vpadmanabhan@aryaka.com>
>>> wrote:
>>>  
>>>>  
>>>>  
>>>> Hi Srimanth,
>>>>  
>>>>   Thanks for the response my replies below. I am using HDP-1.3.0.0.
>>>>  
>>>>  a) It was a https call made to ambari server from agent;
>>>>  
>>>>  INFO 2013-07-04 09:38:00,378 security.py:49 - SSL Connect being called..
>>>> connecting to the server
>>>>  INFO 2013-07-04 09:38:00,563 Controller.py:99 - Unable to connect to:
>>>> https://xxxxx:8441/agent/v1/register/xxxxxx
>>>>  
>>>>  If i change the jdk to 1.6, it starts working.
>>>>  
>>>>  b) When I manually acess the url, I can properly see the status, gangalia,
>>>> do start/stop, config changes etc.
>>>>  It doesnt jump back to the installer.
>>>>  
>>>>  c) 
>>>>  
>>>>  
>>>> Was the ambari-server started on localhost initially perhaps?
>>>>  
>>>>      This could be. But after we corrected other machines, we did
>>>> ambari-server reset.
>>>>      Next time it failed saying the same localhost, even though the conf
>>>> was proper.
>>>>      Hence we removed rpm, but still did not help and finally deleted
>>>> /etc/ambari-agent  and  /usr/lib/ambari*. Which helped.
>>>>  
>>>>  
>>>>  (Everytime we were doing retry for that machine installation alone)
>>>>  
>>>>  d) Sure will have a look at the agent logs next time.
>>>>  
>>>>  
>>>>  Thanks
>>>>  Vivek 
>>>>  
>>>> 
>>>>  
>>>>  
>>>>  On Thursday 04 July 2013 07:10 PM, Srimanth Gunturi wrote:
>>>>  
>>>>  
>>>>  
>>>>  
>>>>  
>>>>  
>>>>>  
>>>>> Hi Vivek, 
>>>>> Wanted to find out the version of Ambari you are using.
>>>>>  
>>>>> 
>>>>>  
>>>>>  
>>>>>  
>>>>> a) What sort of communication failures were you seeing? If there is
>>>>> anything specific in logs that you can share?
>>>>>  
>>>>> 
>>>>>  
>>>>>  
>>>>> b) UI jumping to installer after login means that the server says
>>>>> installation is not complete. Did you notice any errors during install?
>>>>> Also when it does go back to installer, which page of installer does
it
>>>>> end on, and are any previous values populated?
>>>>>  
>>>>> 
>>>>>  
>>>>>  
>>>>> When you do manually go to http://xxx:5858/#/main/dashboard - does it
stay
>>>>> there, or jump back to installer after a few clicks?
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>> c) Ambari server should be setup on a hostname (hostname -f) from where
>>>>> agent nodes can talk back.
>>>>>  
>>>>> Was the ambari-server started on localhost initially perhaps?
>>>>>  
>>>>> When some agent hosts had server as localhost - did you install agent
>>>>> manually? 
>>>>>  
>>>>> 
>>>>>  
>>>>>  
>>>>> d) Ganglia server component failed to install for some reason. The agent
>>>>> logs on that node should contain exceptions of why it failed. Fixing
that
>>>>> issue should help.
>>>>>  
>>>>> 
>>>>>  
>>>>>  
>>>>>  Regards,
>>>>>  
>>>>> Srimanth
>>>>>  
>>>>> 
>>>>>  
>>>>>  
>>>>> 
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>> 
>>>>>  
>>>>>  
>>>>> On Wed, Jul 3, 2013 at 10:10 PM, Vivek Padmanabhan
>>>>> <vpadmanabhan@aryaka.com> wrote:
>>>>>  
>>>>>> Hi,
>>>>>>  I was trying out ambari to setup a cluster and we faced some of
the
>>>>>> below issues. Would be great if someone could throw some light on
these;
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  a) Is it possible to run ambari with jdk1.7. We are seeing some
>>>>>> communication failures while using 1.7 for ambari.
>>>>>>  But prior to ambari we have tested our hadoop programs with 1.7
and
>>>>>> everything went well. And all of
>>>>>>  our code base is in 1.7. (we have no native apps)
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  b) After a cluster setup finished successfully,we are able to see
the
>>>>>> dashborad etc. But after few clicks or if i am accessing
>>>>>>  it from a different machine it again redirects me to the installation
>>>>>> page.
>>>>>>  
>>>>>>  I figured out that manually entering the below urls only can help
us.
>>>>>> (our port is 585. and browser cache is cleared)
>>>>>>  http://xxx:5858/#/main/dashboard
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  c) During our process of hadoop deployment and installation, some
>>>>>> servers failed (ssh access) and some passed .
>>>>>>  So we had to reset and start from the beginning. But this time those
>>>>>> which passed earlier are failing now,
>>>>>>  since it thinks that the ambari server is 'localhost' .
>>>>>>  
>>>>>>  The property in the /etc/...ini file the server ip was proper. So,
we
>>>>>> tried the following in those failed machines
>>>>>>  
>>>>>>  * Remove rpm,reset ambari - This did not work on retry
>>>>>>  * Remove the rpm,delete /etc/ambari-agent, delete /usr/lib/ambari*
,
>>>>>> retry ­ It worked
>>>>>>  
>>>>>>  Does this mean that the rpm -e did not remove all the files? Is
there
>>>>>> anything extra we need to care take in such scenarios
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  d) Hadoop installation and deployment gets successful at random
retries.
>>>>>> When it fails only message we saw was ;
>>>>>>  ERROR ServiceComponentHostImpl:721 ­ Can¹t handle
>>>>>> ServiceComponentHostEvent event at current state,
>>>>>> serviceComponentName=GANGLIA_SERVER, hostName=server233.xxxxxx,
>>>>>> currentState=INSTALL_FAILED, eventType=HOST_SVCCOMP_OP_
>>>>>>  SUCCEEDED, event=EventType: HOST_SVCCOMP_OP_SUCCEEDED
>>>>>>  15:17:12,934 WARN HeartBeatHandler:233 ­ State machine exception
>>>>>>  org.apache.ambari.server.state.fsm.InvalidStateTransitionException:
>>>>>> Invalid event: HOST_SVCCOMP_OP_SUCCEEDED at INSTALL_FAILED
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  
>>>>>>  Thanks
>>>>>>  Vivek
>>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>>  
>>>>  
>>>>  
>>>>  
>>>>  
>>>>  
>>>  
>>>  
>>>  
>>>  
>>  
>>  
>>  
>>  
>>  
>  
>  
>  
>  
 
 



Mime
View raw message