ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dmitro Lisnichenko" <dlysniche...@hortonworks.com>
Subject Review Request 40826: Rebalance HDFS after enabling NN HA failed
Date Tue, 01 Dec 2015 17:01:36 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40826/
-----------------------------------------------------------

Review request for Ambari, Andrew Onischuk and Vitalyi Brodetskyi.


Bugs: AMBARI-14137
    https://issues.apache.org/jira/browse/AMBARI-14137


Repository: ambari


Description
-------

STR:
1) Install and deploy cluster
2) Enable NameNode HA
3) Enable security
3) Start rebalance HDFS

Actually result:

{code}
"stderr" : "Traceback (most recent call last):\n  File \"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py\",
line 425, in <module>\n    NameNode().execute()\n  File \"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py\",
line 218, in execute\n    method(env)\n  File \"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py\",
line 363, in rebalancehdfs\n    logoutput = False,\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/base.py\",
line 154, in __init__\n    self.env.run()\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/environment.py\",
line 156, in run\n    self.run_action(resource, action)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/environment.py\",
line 119, in run_action\n    provider_action()\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py\",
line 238, in action_run\n    tries
 =self.resource.tries, try_sleep=self.resource.try_sleep)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\",
line 70, in inner\n    result = function(command, **kwargs)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\",
line 92, in checked_call\n    tries=tries, try_sleep=try_sleep)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\",
line 140, in _call_wrapper\n    result = _call(command, **kwargs_copy)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\",
line 291, in _call\n    raise Fail(err_msg)\nresource_management.core.exceptions.Fail: Execution
of 'ambari-sudo.sh su cstm-hdfs -l -s /bin/bash -c 'export  PATH='\"'\"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin'\"'\"'
KRB5CCNAME=/tmp/hdfs_rebala
 nce_cc_6ec913166750834c9d9302d65b9c6cb8 ; hdfs --config /usr/hdp/current/hadoop-client/conf
balancer -threshold 10'' returned 252. ######## Hortonworks #############\nThis is MOTD message,
added for testing in qe infra\n15/11/17 07:29:08 INFO balancer.Balancer: Using a threshold
of 10.0\n15/11/17 07:29:08 INFO balancer.Balancer: namenodes  = [hdfs://os-d7-mwznvu-ambari-hv-ser-ha-5-2.novalocal:8020,
hdfs://nameservice]\n15/11/17 07:29:08 INFO balancer.Balancer: parameters = Balancer.BalancerParameters
[BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included
nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]\n15/11/17 07:29:08
INFO balancer.Balancer: included nodes = []\n15/11/17 07:29:08 INFO balancer.Balancer: excluded
nodes = []\n15/11/17 07:29:08 INFO balancer.Balancer: source nodes = []\nTime Stamp      
        Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved\norg.apache.hadoop.ipc.RemoteExcepti
 on(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state
standby\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)\n\tat
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1927)\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313)\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getServerDefaults(FSNamesystem.java:1625)\n\tat
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getServerDefaults(NameNodeRpcServer.java:659)\n\tat
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getServerDefaults(ClientNamenodeProtocolServerSideTranslatorPB.java:383)\n\tat
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.ca
 ll(ProtobufRpcEngine.java:616)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)\n\tat
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)\n\tat
java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)\n\tat
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)\n.  Exiting ...\nNov 17, 2015 7:29:09
AM  Balancing took 1.932 seconds",
{code}

{code}
"stdout" : "Starting balancer with threshold = 10\n2015-11-17 07:29:06,492 - call['/usr/bin/klist
-s /tmp/hdfs_rebalance_cc_6ec913166750834c9d9302d65b9c6cb8'] {'user': 'cstm-hdfs'}\n2015-11-17
07:29:06,514 - call returned (1, '######## Hortonworks #############\\nThis is MOTD message,
added for testing in qe infra')\n2015-11-17 07:29:06,515 - Execute['/usr/bin/kinit -c /tmp/hdfs_rebalance_cc_6ec913166750834c9d9302d65b9c6cb8
-kt /etc/security/keytabs/hdfs.headless.keytab cstm-hdfs@EXAMPLE.COM'] {'user': 'cstm-hdfs'}\nExecuting
command ambari-sudo.sh su cstm-hdfs -l -s /bin/bash -c 'export  PATH='\"'\"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin'\"'\"'
KRB5CCNAME=/tmp/hdfs_rebalance_cc_6ec913166750834c9d9302d65b9c6cb8 ; hdfs --config /usr/hdp/current/hadoop-client/conf
balancer -threshold 10'\n2015-11-17 07:29:
 06,550 - Execute['ambari-sudo.sh su cstm-hdfs -l -s /bin/bash -c 'export  PATH='\"'\"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin'\"'\"'
KRB5CCNAME=/tmp/hdfs_rebalance_cc_6ec913166750834c9d9302d65b9c6cb8 ; hdfs --config /usr/hdp/current/hadoop-client/conf
balancer -threshold 10''] {'logoutput': False, 'on_new_line': handle_new_line}\n[balancer]
######## Hortonworks #############\nThis is MOTD message, added for testing in qe infra\n[balancer]
15/11/17 07:29:08 INFO balancer.Balancer: Using a threshold of 10.0\n[balancer] 15/11/17 07:29:08
INFO balancer.Balancer: namenodes  = [hdfs://os-d7-mwznvu-ambari-hv-ser-ha-5-2.novalocal:8020,
hdfs://nameservice]\n[balancer] 15/11/17 07:29:08 INFO balancer.Balancer: parameters = Balancer.BalancerParameters
[BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5,
  #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during
upgrade = false]\n[balancer] 15/11/17 07:29:08 INFO balancer.Balancer: included nodes = []\n[balancer]
15/11/17 07:29:08 INFO balancer.Balancer: excluded nodes = []\n[balancer] 15/11/17 07:29:08
INFO balancer.Balancer: source nodes = []\n[balancer] Time Stamp               Iteration#
 Bytes Already Moved  Bytes Left To Move  Bytes Being Moved[balancer] \n[balancer] org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
Operation category READ is not supported in state standby\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)\n\tat
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1927)\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313)\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getServerDefaults(FSNamesystem.java:1625)\
 n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getServerDefaults(NameNodeRpcServer.java:659)\n\tat
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getServerDefaults(ClientNamenodeProtocolServerSideTranslatorPB.java:383)\n\tat
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto[balancer] bufRpcEngine.java:616)\n\tat
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)\n\tat
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)\n\tat java.security.AccessController.doPrivileged(Native
Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)\n\tat
org.apache.hadoop.ipc.Server$Handler.run(Server.jav
 a:2145)\n.  Exiting ...[balancer] \n[balancer] Nov 17, 2015 7:29:09 AM [balancer]  [balancer]
Balancing took 1.932 seconds[balancer]",
{code}


Diffs
-----

  ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py
9800ff1 

Diff: https://reviews.apache.org/r/40826/diff/


Testing
-------

mvn clean test


Thanks,

Dmitro Lisnichenko


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message