ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Fernandez (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AMBARI-6585) Deleted hosts come back to life after ambari-server restart
Date Mon, 28 Jul 2014 18:26:39 GMT

    [ https://issues.apache.org/jira/browse/AMBARI-6585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14076532#comment-14076532
] 

Alejandro Fernandez commented on AMBARI-6585:
---------------------------------------------

* Problem: Can't re-add a deleted host to a cluster.

* Root-Cause: There is a race condition that prevents deleting a host while the host is still
hearbeating. Because the delete doesn't actually take place, the user is given the impression
that the delete was successful, so when they attempt to re-add the host to a cluster, the
operation fails.

* Workaround: In order to re-add a host to a cluster, make sure that the host is not heart
beating. If the user encounters an error that the host already exists, then stop the agent
on the host, restart the server, add the host to the cluster, restart the server, and then
start the agent on the host. 

> Deleted hosts come back to life after ambari-server restart
> -----------------------------------------------------------
>
>                 Key: AMBARI-6585
>                 URL: https://issues.apache.org/jira/browse/AMBARI-6585
>             Project: Ambari
>          Issue Type: Bug
>          Components: site
>    Affects Versions: 1.6.1
>            Reporter: Alejandro Fernandez
>            Assignee: Alejandro Fernandez
>             Fix For: 1.7.0
>
>
> When attempting to delete a host through the UI, and then re-add it, the re-add operation
fails because a record already exists in the clusterhostmapping table.
> This can be reproduced as follows (host names will change of course),
> 1. Create a cluster and add a host so that it is populated in the clusterhostmapping
table.
> 2. Make sure the agent is running.
> 3. On the server, run ambari-server restart, and immediately run the following repeatedly
in another terminal window before the restart finishes, 
> {noformat}
> curl --write-out %{http_code} --show-error -u admin:admin -H 'X-Requested-By:1' -i -X
DELETE http://c6404.ambari.apache.org:8080/api/v1/clusters/dev/hosts/c6407.ambari.apache.org
> HTTP/1.1 200 OK
> Set-Cookie: AMBARISESSIONID=z91px2l41uc6dwjv52zl2mcu;Path=/
> Expires: Thu, 01 Jan 1970 00:00:00 GMT
> Content-Type: text/plain
> Content-Length: 0
> Server: Jetty(7.6.7.v20120910)
> {noformat}
> 4. Quickly verify that the host name is removed from the clusterhostmapping table.
> 5. On the agent, run ambari-agent restart, and repeatedly requery the clusterhostmapping
table, until the record is reinserted (should take no more than 30 seconds to appear).
> 6. Run the curl command to attempt to re-add the host, and receive the error message,
> {noformat}
> curl --write-out %{http_code} --show-error -u admin:admin -H 'X-Requested-By:1' -i POST
http://c6404.ambari.apache.org:8080/api/v1/clusters/dev/hosts/c6407.ambari.apache.org
> HTTP/1.1 500 Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.4.0.v20120608-r11652):
org.eclipse.persistence.exceptions.DatabaseException Internal Exception: java.sql.BatchUpdateException:
Batch entry 0 INSERT INTO ClusterHostMapping (cluster_id, host_name) VALUES (2, 'c6407.ambari.apache.org')
was aborted.  Call getNextException to see the cause. Error Code: 0 Call: INSERT INTO ClusterHostMapping
(cluster_id, host_name) VALUES (?, ?) 	bind => [2 parameters bound]
> Set-Cookie: AMBARISESSIONID=1je1wahcml82f11gjrserxgdyl;Path=/
> Content-Type: text/plain;charset=ISO-8859-1
> Content-Length: 530
> Server: Jetty(7.6.7.v20120910)
> {
>   "status": 500,
>   "message": "Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.4.0.v20120608-r11652):
org.eclipse.persistence.exceptions.DatabaseException\nInternal Exception: java.sql.BatchUpdateException:
Batch entry 0 INSERT INTO ClusterHostMapping (cluster_id, host_name) VALUES (2, \u0027c6407.ambari.apache.org\u0027)
was aborted.  Call getNextException to see the cause.\nError Code: 0\nCall: INSERT INTO ClusterHostMapping
(cluster_id, host_name) VALUES (?, ?)\n\tbind \u003d\u003e [2 parameters bound]"
> {noformat}
> At this point, here is the state of the tables.
> {noformat}
> select * from clusterhostmapping where host_name = 'c6407.ambari.apache.org';
>  cluster_id |        host_name
> ------------+-------------------------
>           2 | c6407.ambari.apache.org
> select * from hoststate where host_name = 'c6407.ambari.apache.org';
>     agent_version    | available_mem | current_state |                health_status 
               |        host_name        | time_in_state | maintenance_state
> ---------------------+---------------+---------------+----------------------------------------------+-------------------------+---------------+-------------------
>  {"version":"1.6.0"} |        250232 | INIT          | {"healthStatus":"HEALTHY","healthReport":""}
| c6407.ambari.apache.org | 1405718796141 | {"2":"ON"}
> {noformat}
> I then deleted both records, restarted the server, and was then able to add the host
successfully.
> This is a bug in the persistence layer.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message