ambari-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Onischuk" <aonis...@hortonworks.com>
Subject Re: Review Request 40826: Rebalance HDFS after enabling NN HA failed
Date Tue, 01 Dec 2015 17:11:07 GMT


> On Dec. 1, 2015, 5:07 p.m., Andrew Onischuk wrote:
> > ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py,
line 317
> > <https://reviews.apache.org/r/40826/diff/1/?file=1149706#file1149706line317>
> >
> >     This is very hacky. I think we should fix this in the first place, so this property
is deleted when NN HA is enabled.

The one reason it is hacky is that if XmlConfig will gain a new param or something will change
in the way we generate hdfs-site, it will be a place we'll never find to do a change.

Another reason, is that our configs are immutable everywhere.


- Andrew


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/40826/#review108533
-----------------------------------------------------------


On Dec. 1, 2015, 5:01 p.m., Dmitro Lisnichenko wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/40826/
> -----------------------------------------------------------
> 
> (Updated Dec. 1, 2015, 5:01 p.m.)
> 
> 
> Review request for Ambari, Andrew Onischuk and Vitalyi Brodetskyi.
> 
> 
> Bugs: AMBARI-14137
>     https://issues.apache.org/jira/browse/AMBARI-14137
> 
> 
> Repository: ambari
> 
> 
> Description
> -------
> 
> STR:
> 1) Install and deploy cluster
> 2) Enable NameNode HA
> 3) Enable security
> 3) Start rebalance HDFS
> 
> Actually result:
> 
> {code}
> "stderr" : "Traceback (most recent call last):\n  File \"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py\",
line 425, in <module>\n    NameNode().execute()\n  File \"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py\",
line 218, in execute\n    method(env)\n  File \"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py\",
line 363, in rebalancehdfs\n    logoutput = False,\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/base.py\",
line 154, in __init__\n    self.env.run()\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/environment.py\",
line 156, in run\n    self.run_action(resource, action)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/environment.py\",
line 119, in run_action\n    provider_action()\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py\",
line 238, in action_run\n    tri
 es=self.resource.tries, try_sleep=self.resource.try_sleep)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\",
line 70, in inner\n    result = function(command, **kwargs)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\",
line 92, in checked_call\n    tries=tries, try_sleep=try_sleep)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\",
line 140, in _call_wrapper\n    result = _call(command, **kwargs_copy)\n  File \"/usr/lib/python2.6/site-packages/resource_management/core/shell.py\",
line 291, in _call\n    raise Fail(err_msg)\nresource_management.core.exceptions.Fail: Execution
of 'ambari-sudo.sh su cstm-hdfs -l -s /bin/bash -c 'export  PATH='\"'\"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin'\"'\"'
KRB5CCNAME=/tmp/hdfs_reba
 lance_cc_6ec913166750834c9d9302d65b9c6cb8 ; hdfs --config /usr/hdp/current/hadoop-client/conf
balancer -threshold 10'' returned 252. ######## Hortonworks #############\nThis is MOTD message,
added for testing in qe infra\n15/11/17 07:29:08 INFO balancer.Balancer: Using a threshold
of 10.0\n15/11/17 07:29:08 INFO balancer.Balancer: namenodes  = [hdfs://os-d7-mwznvu-ambari-hv-ser-ha-5-2.novalocal:8020,
hdfs://nameservice]\n15/11/17 07:29:08 INFO balancer.Balancer: parameters = Balancer.BalancerParameters
[BalancingPolicy.Node, threshold = 10.0, max idle iteration = 5, #excluded nodes = 0, #included
nodes = 0, #source nodes = 0, #blockpools = 0, run during upgrade = false]\n15/11/17 07:29:08
INFO balancer.Balancer: included nodes = []\n15/11/17 07:29:08 INFO balancer.Balancer: excluded
nodes = []\n15/11/17 07:29:08 INFO balancer.Balancer: source nodes = []\nTime Stamp      
        Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved\norg.apache.hadoop.ipc.RemoteExcep
 tion(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in
state standby\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)\n\tat
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1927)\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313)\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getServerDefaults(FSNamesystem.java:1625)\n\tat
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getServerDefaults(NameNodeRpcServer.java:659)\n\tat
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getServerDefaults(ClientNamenodeProtocolServerSideTranslatorPB.java:383)\n\tat
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.
 call(ProtobufRpcEngine.java:616)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)\n\tat
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)\n\tat
java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)\n\tat
org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)\n.  Exiting ...\nNov 17, 2015 7:29:09
AM  Balancing took 1.932 seconds",
> {code}
> 
> {code}
> "stdout" : "Starting balancer with threshold = 10\n2015-11-17 07:29:06,492 - call['/usr/bin/klist
-s /tmp/hdfs_rebalance_cc_6ec913166750834c9d9302d65b9c6cb8'] {'user': 'cstm-hdfs'}\n2015-11-17
07:29:06,514 - call returned (1, '######## Hortonworks #############\\nThis is MOTD message,
added for testing in qe infra')\n2015-11-17 07:29:06,515 - Execute['/usr/bin/kinit -c /tmp/hdfs_rebalance_cc_6ec913166750834c9d9302d65b9c6cb8
-kt /etc/security/keytabs/hdfs.headless.keytab cstm-hdfs@EXAMPLE.COM'] {'user': 'cstm-hdfs'}\nExecuting
command ambari-sudo.sh su cstm-hdfs -l -s /bin/bash -c 'export  PATH='\"'\"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin'\"'\"'
KRB5CCNAME=/tmp/hdfs_rebalance_cc_6ec913166750834c9d9302d65b9c6cb8 ; hdfs --config /usr/hdp/current/hadoop-client/conf
balancer -threshold 10'\n2015-11-17 07:2
 9:06,550 - Execute['ambari-sudo.sh su cstm-hdfs -l -s /bin/bash -c 'export  PATH='\"'\"'/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/var/lib/ambari-agent:/var/lib/ambari-agent:/usr/hdp/current/hadoop-client/bin'\"'\"'
KRB5CCNAME=/tmp/hdfs_rebalance_cc_6ec913166750834c9d9302d65b9c6cb8 ; hdfs --config /usr/hdp/current/hadoop-client/conf
balancer -threshold 10''] {'logoutput': False, 'on_new_line': handle_new_line}\n[balancer]
######## Hortonworks #############\nThis is MOTD message, added for testing in qe infra\n[balancer]
15/11/17 07:29:08 INFO balancer.Balancer: Using a threshold of 10.0\n[balancer] 15/11/17 07:29:08
INFO balancer.Balancer: namenodes  = [hdfs://os-d7-mwznvu-ambari-hv-ser-ha-5-2.novalocal:8020,
hdfs://nameservice]\n[balancer] 15/11/17 07:29:08 INFO balancer.Balancer: parameters = Balancer.BalancerParameters
[BalancingPolicy.Node, threshold = 10.0, max idle iteration = 
 5, #excluded nodes = 0, #included nodes = 0, #source nodes = 0, #blockpools = 0, run during
upgrade = false]\n[balancer] 15/11/17 07:29:08 INFO balancer.Balancer: included nodes = []\n[balancer]
15/11/17 07:29:08 INFO balancer.Balancer: excluded nodes = []\n[balancer] 15/11/17 07:29:08
INFO balancer.Balancer: source nodes = []\n[balancer] Time Stamp               Iteration#
 Bytes Already Moved  Bytes Left To Move  Bytes Being Moved[balancer] \n[balancer] org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException):
Operation category READ is not supported in state standby\n\tat org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)\n\tat
org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1927)\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1313)\n\tat
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getServerDefaults(FSNamesystem.java:1625
 )\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getServerDefaults(NameNodeRpcServer.java:659)\n\tat
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getServerDefaults(ClientNamenodeProtocolServerSideTranslatorPB.java:383)\n\tat
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(Proto[balancer] bufRpcEngine.java:616)\n\tat
org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)\n\tat org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)\n\tat
org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)\n\tat java.security.AccessController.doPrivileged(Native
Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)\n\tat
org.apache.hadoop.ipc.Server$Handler.run(Server.j
 ava:2145)\n.  Exiting ...[balancer] \n[balancer] Nov 17, 2015 7:29:09 AM [balancer]  [balancer]
Balancing took 1.932 seconds[balancer]",
> {code}
> 
> 
> Diffs
> -----
> 
>   ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/scripts/namenode.py
9800ff1 
> 
> Diff: https://reviews.apache.org/r/40826/diff/
> 
> 
> Testing
> -------
> 
> mvn clean test
> 
> 
> Thanks,
> 
> Dmitro Lisnichenko
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message