accumulo-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christopher Tubbs (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (ACCUMULO-2971) ChangeSecret tool should refuse to run if no write access to HDFS
Date Wed, 13 Jul 2016 22:29:20 GMT

     [ https://issues.apache.org/jira/browse/ACCUMULO-2971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Christopher Tubbs updated ACCUMULO-2971:
----------------------------------------
    Description: 
Currently, the ChangeSecret tool doesn't do any check to ensure the user running it has the
ability to write to /accumlo/instance_id.

In the event that an admin knows the instance secret but runs the command as a user who can
not write to the instance_id, the result is an unhelpful error message and a disconnect between
HDFS and zookeeper.


Example for cluster with instance named "foobar"

{code}
[busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id
Found 1 items
-rw-r--r--   3 accumulo accumulo          0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4
[busbey@edge ~]$ accumulo org.apache.accumulo.server.util.ChangeSecret
old zookeeper password: 
new zookeeper password: 
Thread "org.apache.accumulo.server.util.ChangeSecret" died Permission denied: user=busbey,
access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

org.apache.hadoop.security.AccessControlException: Permission denied: user=busbey, access=WRITE,
inode="/accumulo":accumulo:accumulo:drwxr-x--x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1489)
	at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:355)
	at org.apache.accumulo.server.util.ChangeSecret.updateHdfs(ChangeSecret.java:150)
	at org.apache.accumulo.server.util.ChangeSecret.main(ChangeSecret.java:66)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.accumulo.start.Main$1.run(Main.java:141)
	at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

	at org.apache.hadoop.ipc.Client.call(Client.java:1238)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
	at $Proxy16.delete(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:408)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
	at $Proxy17.delete(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1487)
	... 9 more
[busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id
Found 1 items
-rw-r--r--   3 accumulo accumulo          0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4
[busbey@edge ~]$ zookeeper-client
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] get /accumulo/instances/foobar
1528cc95-2600-4649-a50e-1645404e9d6c
cZxid = 0xe00034f45
ctime = Wed Jul 02 09:27:58 PDT 2014
mZxid = 0xe00034f45
mtime = Wed Jul 02 09:27:58 PDT 2014
pZxid = 0xe00034f45
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 36
numChildren = 0
[zk: localhost:2181(CONNECTED) 1] ls /accumulo/1528cc95-2600-4649-a50e-1645404e9d6c
[users, monitor, problems, root_tablet, gc, hdfs_reservations, table_locks, namespaces, recovery,
fate, tservers, tables, next_file, tracers, config, dead, bulk_failed_copyq, masters]
[zk: localhost:2181(CONNECTED) 2] ls /accumulo/cb977c77-3e13-4522-b718-2b487d722fd4
[users, problems, monitor, root_tablet, hdfs_reservations, gc, table_locks, namespaces, recovery,
fate, tservers, tables, next_file, tracers, config, masters, bulk_failed_copyq, dead]

{code}

What's worse, in this condition the cluster will properly come up and show everything fine
if the old instance secret is used.

However, clients and servers will now end up looking at different zookeeper nodes depending
on wether they used HDFS to get the instance_id or if they use a ZK instance name lookup to
get it so long as they use the corresponding instance secret.

Furthermore, if an admin uses the CleanZooKeeper utility  subsequent to this failure, it'll
cause the loss of the zookeeper nodes the server processes are looking at.

The utility should do a sanity check that /accumulo/instance_id is writable prior to changing
zookeeper. It should also wait to update the instance name to instand_id pointer in zookeeper
until after HDFS has been updated.

Workaround: manually edit the HDFS instance_id to match the new instance id found zk for the
instance name and proceed as though the secret change had succeeded.

  was:
Currently, the ChangePassword tool doesn't do any check to ensure the user running it has
the ability to write to /accumlo/instance_id.

In the event that an admin knows the instance secret but runs the command as a user who can
not write to the instance_id, the result is an unhelpful error message and a disconnect between
HDFS and zookeeper.


Example for cluster with instance named "foobar"

{code}
[busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id
Found 1 items
-rw-r--r--   3 accumulo accumulo          0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4
[busbey@edge ~]$ accumulo org.apache.accumulo.server.util.ChangeSecret
old zookeeper password: 
new zookeeper password: 
Thread "org.apache.accumulo.server.util.ChangeSecret" died Permission denied: user=busbey,
access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

org.apache.hadoop.security.AccessControlException: Permission denied: user=busbey, access=WRITE,
inode="/accumulo":accumulo:accumulo:drwxr-x--x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1489)
	at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:355)
	at org.apache.accumulo.server.util.ChangeSecret.updateHdfs(ChangeSecret.java:150)
	at org.apache.accumulo.server.util.ChangeSecret.main(ChangeSecret.java:66)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.accumulo.start.Main$1.run(Main.java:141)
	at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:396)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)

	at org.apache.hadoop.ipc.Client.call(Client.java:1238)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
	at $Proxy16.delete(Unknown Source)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:408)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
	at $Proxy17.delete(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1487)
	... 9 more
[busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id
Found 1 items
-rw-r--r--   3 accumulo accumulo          0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4
[busbey@edge ~]$ zookeeper-client
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] get /accumulo/instances/foobar
1528cc95-2600-4649-a50e-1645404e9d6c
cZxid = 0xe00034f45
ctime = Wed Jul 02 09:27:58 PDT 2014
mZxid = 0xe00034f45
mtime = Wed Jul 02 09:27:58 PDT 2014
pZxid = 0xe00034f45
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 36
numChildren = 0
[zk: localhost:2181(CONNECTED) 1] ls /accumulo/1528cc95-2600-4649-a50e-1645404e9d6c
[users, monitor, problems, root_tablet, gc, hdfs_reservations, table_locks, namespaces, recovery,
fate, tservers, tables, next_file, tracers, config, dead, bulk_failed_copyq, masters]
[zk: localhost:2181(CONNECTED) 2] ls /accumulo/cb977c77-3e13-4522-b718-2b487d722fd4
[users, problems, monitor, root_tablet, hdfs_reservations, gc, table_locks, namespaces, recovery,
fate, tservers, tables, next_file, tracers, config, masters, bulk_failed_copyq, dead]

{code}

What's worse, in this condition the cluster will properly come up and show everything fine
if the old instance secret is used.

However, clients and servers will now end up looking at different zookeeper nodes depending
on wether they used HDFS to get the instance_id or if they use a ZK instance name lookup to
get it so long as they use the corresponding instance secret.

Furthermore, if an admin uses the CleanZooKeeper utility  subsequent to this failure, it'll
cause the loss of the zookeeper nodes the server processes are looking at.

The utility should do a sanity check that /accumulo/instance_id is writable prior to changing
zookeeper. It should also wait to update the instance name to instand_id pointer in zookeeper
until after HDFS has been updated.

Workaround: manually edit the HDFS instance_id to match the new instance id found zk for the
instance name and proceed as though the secret change had succeeded.


> ChangeSecret tool should refuse to run if no write access to HDFS
> -----------------------------------------------------------------
>
>                 Key: ACCUMULO-2971
>                 URL: https://issues.apache.org/jira/browse/ACCUMULO-2971
>             Project: Accumulo
>          Issue Type: Bug
>    Affects Versions: 1.5.0, 1.5.1, 1.6.0
>            Reporter: Sean Busbey
>              Labels: newbie
>             Fix For: 1.8.1
>
>
> Currently, the ChangeSecret tool doesn't do any check to ensure the user running it has
the ability to write to /accumlo/instance_id.
> In the event that an admin knows the instance secret but runs the command as a user who
can not write to the instance_id, the result is an unhelpful error message and a disconnect
between HDFS and zookeeper.
> Example for cluster with instance named "foobar"
> {code}
> [busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id
> Found 1 items
> -rw-r--r--   3 accumulo accumulo          0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4
> [busbey@edge ~]$ accumulo org.apache.accumulo.server.util.ChangeSecret
> old zookeeper password: 
> new zookeeper password: 
> Thread "org.apache.accumulo.server.util.ChangeSecret" died Permission denied: user=busbey,
access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
> org.apache.hadoop.security.AccessControlException: Permission denied: user=busbey, access=WRITE,
inode="/accumulo":accumulo:accumulo:drwxr-x--x
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> 	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
> 	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
> 	at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
> 	at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:90)
> 	at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57)
> 	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1489)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.delete(DistributedFileSystem.java:355)
> 	at org.apache.accumulo.server.util.ChangeSecret.updateHdfs(ChangeSecret.java:150)
> 	at org.apache.accumulo.server.util.ChangeSecret.main(ChangeSecret.java:66)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.accumulo.start.Main$1.run(Main.java:141)
> 	at java.lang.Thread.run(Thread.java:662)
> Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
Permission denied: user=busbey, access=WRITE, inode="/accumulo":accumulo:accumulo:drwxr-x--x
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:224)
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:204)
> 	at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:4846)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInternal(FSNamesystem.java:2911)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:2872)
> 	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:2859)
> 	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:642)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:408)
> 	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44968)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
> 	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1752)
> 	at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1748)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:396)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1746)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:1238)
> 	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
> 	at $Proxy16.delete(Unknown Source)
> 	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.delete(ClientNamenodeProtocolTranslatorPB.java:408)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
> 	at $Proxy17.delete(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSClient.delete(DFSClient.java:1487)
> 	... 9 more
> [busbey@edge ~]$ hdfs dfs -ls /accumulo/instance_id
> Found 1 items
> -rw-r--r--   3 accumulo accumulo          0 2014-07-02 09:05 /accumulo/instance_id/cb977c77-3e13-4522-b718-2b487d722fd4
> [busbey@edge ~]$ zookeeper-client
> Connecting to localhost:2181
> Welcome to ZooKeeper!
> JLine support is enabled
> WATCHER::
> WatchedEvent state:SyncConnected type:None path:null
> [zk: localhost:2181(CONNECTED) 0] get /accumulo/instances/foobar
> 1528cc95-2600-4649-a50e-1645404e9d6c
> cZxid = 0xe00034f45
> ctime = Wed Jul 02 09:27:58 PDT 2014
> mZxid = 0xe00034f45
> mtime = Wed Jul 02 09:27:58 PDT 2014
> pZxid = 0xe00034f45
> cversion = 0
> dataVersion = 0
> aclVersion = 0
> ephemeralOwner = 0x0
> dataLength = 36
> numChildren = 0
> [zk: localhost:2181(CONNECTED) 1] ls /accumulo/1528cc95-2600-4649-a50e-1645404e9d6c
> [users, monitor, problems, root_tablet, gc, hdfs_reservations, table_locks, namespaces,
recovery, fate, tservers, tables, next_file, tracers, config, dead, bulk_failed_copyq, masters]
> [zk: localhost:2181(CONNECTED) 2] ls /accumulo/cb977c77-3e13-4522-b718-2b487d722fd4
> [users, problems, monitor, root_tablet, hdfs_reservations, gc, table_locks, namespaces,
recovery, fate, tservers, tables, next_file, tracers, config, masters, bulk_failed_copyq,
dead]
> {code}
> What's worse, in this condition the cluster will properly come up and show everything
fine if the old instance secret is used.
> However, clients and servers will now end up looking at different zookeeper nodes depending
on wether they used HDFS to get the instance_id or if they use a ZK instance name lookup to
get it so long as they use the corresponding instance secret.
> Furthermore, if an admin uses the CleanZooKeeper utility  subsequent to this failure,
it'll cause the loss of the zookeeper nodes the server processes are looking at.
> The utility should do a sanity check that /accumulo/instance_id is writable prior to
changing zookeeper. It should also wait to update the instance name to instand_id pointer
in zookeeper until after HDFS has been updated.
> Workaround: manually edit the HDFS instance_id to match the new instance id found zk
for the instance name and proceed as though the secret change had succeeded.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message