hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From takeshi <takeshi.m...@gmail.com>
Subject Re: All region server died due to "Parent directory doesn't exist"
Date Wed, 19 Feb 2014 09:38:39 GMT
Hi All,

I need to say sorry about this question...Due to I found that there is a
code snippet of shell script in our operation tool would delete the
/<hbase-root>/.logs folder. So this is JUST a MISUNDERSTANDING.

Sorry to waste your time on this...and tks the help from Ted & Rabbit's
Foot Orz


Best regards

takeshi


2014-02-19 16:57 GMT+08:00 takeshi <takeshi.miao@gmail.com>:

> HI Rabbit's Foot,
>
> I think this is simply not a kerberos setting issue. Due to every time
> before I run my reproduce tests, I would run the ycsb tests with
> workload[a-f] to make sure the healthiness of the HBase cluster, then run
> the reproduce tests later.
>
> Currently, I saw two type of error msgs from regionservers during this
> reproducing test
>
> One is...
> {code:title=regionserver running on aws-scottm-aws-1}
>
> 243 [2014-02-18 01:19:10,142][Thread-11][WARN
> ][org.apache.hadoop.hdfs.DFSClient]: DataStreamer Exception
> 244
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on
> /user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714
> 776/aws-scottm-tmh6-1.ap-southeast-1.compute.internal%2C8120%2C1392703714776.1392703718212
> File does not exist. Holder DFSClient_NONMAPREDUCE_-2013292460_25 does not
> have any open files.
> {code}
>
> Another is
> {code:title=regionserver running on aws-scottm-aws-2}
> 491 [2014-02-18
> 02:09:44,813][regionserver8120.logRoller][FATAL][org.apache.hadoop.hbase.regionserver.HRegionServer]:
> ABORTING region server
> aws-scottm-tmh6-2.ap-southeast-1.compute.internal,8120,13927037
> 17908: IOE in log roller
> 492 java.io.IOException: Exception in createWriter
> 493         at
> org.apache.hadoop.hbase.regionserver.wal.HLogFileSystem.createWriter(HLogFileSystem.java:66)
> 494         at
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:694)
> 495         at
> org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:631)
> 496         at
> org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
> 497         at java.lang.Thread.run(Thread.java:662)
> 498 Caused by: java.io.IOException: cannot get log writer
> 499         at
> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:750)
> 500         at
> org.apache.hadoop.hbase.regionserver.wal.HLogFileSystem.createWriter(HLogFileSystem.java:60)
> 501         ... 4 more
> 502 Caused by: java.io.IOException: java.io.FileNotFoundException: Parent
> directory doesn't exist:
> /user/SPN-hbase/.logs/aws-scottm-tmh6-2.ap-southeast-1.compute.internal,8120,1392703717908
> {code}
>
> But I think the root cause of these two type of error msgs could be the
> same, which is the "/<hbase-root>/.logs" was removed accidentally at some
> point of time...
>
> Currently, I think this issue is related to the issue which the discussion
> thread was talking about
>
> http://mail-archives.apache.org/mod_mbox/hbase-dev/201305.mbox/%3C1368081563.24364.YahooMailNeo@web140605.mail.bf1.yahoo.com%3E
>
>
>
> Best regards
>
> takeshi
>
>
> 2014-02-19 14:31 GMT+08:00 Rabbit's Foot <rabbitsfoot@is-land.com.tw>:
>
> Hi
>>
>> according to the log message:
>>
>> 1343 2014-02-18 01:19:10,132 ERROR
>> org.apache.hadoop.security.UserGroupInformation:
>> PriviledgedActionException
>> as:hbase/aws-scottm-tmh6-1.ap-southeast-1.compute.internal@LAB
>> (auth:KERBEROS)
>> cause:org.apache
>> .hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
>>
>> /user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776/aws-scottm-tmh6-1.ap-southeast-1.compute.inter
>>
>>
>> it seems to be the problem about Kerberos settings.please check the
>> setting
>>
>> dfs.namenode.keytab.file
>> dfs.namenode.kerberos.principal
>>
>> and check something on HBase:
>>
>> 1.Check if the value of "hbase.rootdir" parameter set up in the
>> hbase-site.xml file is established under the HDFS.
>> 2.using hbase shell . Check if the "create table" command can be executed
>> correctly
>>
>> Best regards
>>
>> rabbit's foot
>>
>>
>> 2014-02-19 13:47 GMT+08:00 takeshi <takeshi.miao@gmail.com>:
>>
>> > Hi, all
>> >
>> > I think that I may need to further describe the test env. I used for
>> > reproducing this issue.
>> >
>> > | hostname | ip | roles |
>> > | aws-scottm-tmh6-1 | 10.147.1.168 | activeNN, ZKFC, JN, ZK, activeJT,
>> > MRZK, NN, TT, HM, RS|
>> > | aws-scottm-tmh6-2 | 10.138.150.16 | standbyNN, ZKFC, JN, standbyJT,
>> MRZK,
>> > NN, TT, HM, RS|
>> > | aws-scottm-tmh6-3 | 10.134.214.132 | JN, NN, TT, RS|
>> >
>> > P.S. I currently do not think that this issue is related to MR related
>> > daemons
>> >
>> > I would find some other places to put the log files for people
>> interested
>> > in these works.
>> >
>> >
>> >
>> > Best regards
>> >
>> > takeshi
>> >
>> >
>> > 2014-02-19 13:34 GMT+08:00 takeshi <takeshi.miao@gmail.com>:
>> >
>> > > Hi, Ted
>> > >
>> > > Yes, there is a regionserver running on 10.147.1.168, following are
>> the
>> > > log snippet for it. But actually this regionserver did nothing around
>> > > 2014-02-18 01:17:31 ...and it detected error around 01:18:53
>> > > {code}
>> > > 218 [2014-02-18 01:08:40,123][REPL IPC Server handler 0 on 8120][INFO
>> > > ][org.apache.hadoop.ipc.HBaseServer]: REPL IPC Server handler 0 on
>> 8120:
>> > > starting
>> > > 219 [2014-02-18 01:08:40,124][REPL IPC Server handler 1 on 8120][INFO
>> > > ][org.apache.hadoop.ipc.HBaseServer]: REPL IPC Server handler 1 on
>> 8120:
>> > > starting
>> > > 220 [2014-02-18 01:08:40,126][REPL IPC Server handler 2 on 8120][INFO
>> > > ][org.apache.hadoop.ipc.HBaseServer]: REPL IPC Server handler 2 on
>> 8120:
>> > > starting
>> > > 221 [2014-02-18 01:08:40,128][regionserver8120][INFO
>> > > ][org.apache.hadoop.hbase.regionserver.HRegionServer]: Serving as
>> > > aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776,
>> RPC
>> > > listening o       n aws-scottm-tmh6-1.ap-southeast-1.compute.internal/
>> > > 10.147.1.168:8120, sessionid=0x14443974ce70009
>> > > 222 [2014-02-18 01:08:40,130][regionserver8120][INFO
>> > > ][org.apache.hadoop.hbase.regionserver.HRegionServer]: Registered
>> > > RegionServer MXBean
>> > > 223 [2014-02-18
>> > >
>> >
>> 01:08:40,142][SplitLogWorker-aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776][INFO
>> > > ][org.apache.hadoop.hbase.regionserver.SplitLogWorker]: SplitLogWorker
>> > > aws-scottm-tm
>> > h6-1.ap-southeast-1.compute.internal,8120,1392703714776
>> > > starting
>> > > 224 [2014-02-18 01:18:52,027][PRI IPC Server handler 0 on 8120][INFO
>> > > ][org.apache.hadoop.hbase.regionserver.HRegionServer]: Received
>> request
>> > to
>> > > open region: usertable,,1392704274325.e8be76322f79998d58750d
>> > > 87f268e0ae.
>> > > 225 [2014-02-18
>> > >
>> >
>> 01:18:52,551][RS_OPEN_REGION-aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776-0][INFO
>> > > ][org.apache.hadoop.hbase.regionserver.HRegion]: Setting up
>> > tabledescriptor
>> > > config        now ...
>> > > 226 [2014-02-18
>> > >
>> >
>> 01:18:52,584][RS_OPEN_REGION-aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776-0][INFO
>> > > ][org.apache.hadoop.hbase.coprocessor.CoprocessorHost]: System
>> > coprocessor
>> > > org.apa       che.hadoop.hbase.security.token.TokenProvider was loaded
>> > > successfully with priority (536870911).
>> > > 227 [2014-02-18
>> > >
>> >
>> 01:18:52,591][RS_OPEN_REGION-aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776-0][INFO
>> > > ][org.apache.hadoop.hbase.coprocessor.CoprocessorHost]: System
>> > coprocessor
>> > > org.apa       che.hadoop.hbase.security.access.AccessController was
>> > loaded
>> > > successfully with priority (536870912).
>> > > 228 [2014-02-18
>> > >
>> >
>> 01:18:52,946][StoreOpenerThread-usertable,,1392704274325.e8be76322f79998d58750d87f268e0ae.-1][INFO
>> > > ][org.apache.hadoop.hbase.regionserver.Store]: time to purge deletes
>> set
>> > to
>> > > 0ms in store        family
>> > > 229 [2014-02-18
>> > >
>> >
>> 01:18:52,974][StoreFileOpenerThread-family-1][ERROR][org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics]:
>> > > Inconsistent configuration. Previous configuration for using table
>> name
>> > > in        metrics: true, new configuration: false
>> > > 230 [2014-02-18 01:18:52,983][StoreFileOpenerThread-family-1][INFO
>> > > ][org.apache.hadoop.hbase.util.ChecksumType]: Checksum using
>> > > org.apache.hadoop.util.PureJavaCrc32
>> > > 231 [2014-02-18 01:18:52,983][StoreFileOpenerThread-family-1][INFO
>> > > ][org.apache.hadoop.hbase.util.ChecksumType]: Checksum can use
>> > > org.apache.hadoop.util.PureJavaCrc32C
>> > > 232 [2014-02-18
>> > >
>> >
>> 01:18:53,305][RS_OPEN_REGION-aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776-0][INFO
>> > > ][org.apache.hadoop.hbase.regionserver.HRegion]: Onlined
>> > > usertable,,1392704274325.       e8be76322f79998d58750d87f268e0ae.;
>> next
>> > > sequenceid=343
>> > > ...
>> > > 241 [2014-02-18
>> > >
>> 01:18:53,644][PostOpenDeployTasks:e8be76322f79998d58750d87f268e0ae][INFO
>> > > ][org.apache.hadoop.hbase.catalog.MetaEditor]: Updated row
>> > > usertable,,1392704274325.e8be76322f79998d58750d87f268e0a       e. with
>> > >
>> >
>> server=aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776
>> > > 242 [2014-02-18
>> > >
>> 01:18:53,644][PostOpenDeployTasks:e8be76322f79998d58750d87f268e0ae][INFO
>> > > ][org.apache.hadoop.hbase.regionserver.HRegionServer]: Done with post
>> > open
>> > > deploy task for region=usertable,,139270
>> > > 4274325.e8be76322f79998d58750d87f268e0ae., daughter=false
>> > > 243 [2014-02-18 01:19:10,142][Thread-11][WARN
>> > > ][org.apache.hadoop.hdfs.DFSClient]: DataStreamer Exception
>> > > 244
>> > >
>> >
>> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
>> > > No lease on
>> > >
>> >
>> /user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714
>> > >
>> >
>> 776/aws-scottm-tmh6-1.ap-southeast-1.compute.internal%2C8120%2C1392703714776.1392703718212
>> > > File does not exist. Holder DFSClient_NONMAPREDUCE_-2013292460_25
>> does
>> > not
>> > > have any open files.
>> > > 245         at
>> > >
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2419)
>> > > 246         at
>> > >
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2410)
>> > > 247         at
>> > >
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2203)
>> > > 248         at
>> > >
>> >
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:480)
>> > > 249         at
>> > >
>> >
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>> > > 250         at
>> > >
>> >
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>> > > 251         at
>> > >
>> >
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> > > 252         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
>> > > 253         at
>> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
>> > > 254         at
>> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
>> > > 255         at java.security.AccessController.doPrivileged(Native
>> Method)
>> > > 256         at javax.security.auth.Subject.doAs(Subject.java:396)
>> > > 257         at
>> > >
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > > 258         at
>> org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
>> > > 259
>> > > 260         at org.apache.hadoop.ipc.Client.call(Client.java:1225)
>> > > 261         at
>> > >
>> >
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)
>> > > 262         at $Proxy11.addBlock(Unknown Source)
>> > > 263         at
>> > >
>> >
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:290)
>> > > 264         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native
>> > Method)
>> > > 265         at
>> > >
>> >
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>> > > 266         at
>> > >
>> >
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>> > > 267         at java.lang.reflect.Method.invoke(Method.java:597)
>> > > 268         at
>> > >
>> >
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)
>> > > 269         at
>> > >
>> >
>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)
>> > > 270         at $Proxy14.addBlock(Unknown Source)
>> > > 271         at
>> > >
>> >
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1176)
>> > > 272         at
>> > >
>> >
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1029)
>> > > 273         at
>> > >
>> >
>> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:487)
>> > > {code}
>> > >
>> > >
>> > >
>> > > Best regards
>> > >
>> > > takeshi
>> > >
>> > >
>> > > 2014-02-19 11:26 GMT+08:00 Ted Yu <yuzhihong@gmail.com>:
>> > >
>> > > Was region server running on 10.147.1.168 ?
>> > >>
>> > >> If so, can you pastebin region server log around 2014-02-18 01:17:31
>> ?
>> > >>
>> > >> Thanks
>> > >>
>> > >>
>> > >> On Tue, Feb 18, 2014 at 9:17 PM, takeshi <takeshi.miao@gmail.com>
>> > wrote:
>> > >>
>> > >> > Hi, all
>> > >> >
>> > >> > We also encounter the issue on our POC cluster which seems like
the
>> > >> issue
>> > >> > yo were talking about, we also still not find the root cause yet,
>> but
>> > >> would
>> > >> > like to share some findings here which may help others' study.
>> > >> >
>> > >> > Environment:
>> > >> > We are using cdh-4.2.1 (hadoop-2.0.0, hbase-0.94.2) as our base
>> > distro.
>> > >> > with some bugfixes porting from other apache versions. running
on
>> 14
>> > >> > machines with OS CentOS-5.3.
>> > >> >
>> > >> > The error msg we found for the issue
>> > >> >
>> > >> > HBase master reports "Parent directory doesn't exist" error
>> > >> > {code:title=hbase-master.log}
>> > >> > 334 [2014-02-18
>> > >> >
>> > >> >
>> > >>
>> >
>> 01:20:30,550][master-aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8100,1392703701863.archivedHFileCleaner][WARN
>> > >> > ][org.apache.hadoop.hbase.master.snapshot.SnapshotFileCache]:
>> Snapshot
>> > >> > di     rectory: hdfs://cluster1/user/SPN-hbase/.snapshot doesn't
>> exist
>> > >> > 335 [2014-02-18 01:20:36,597][IPC Server handler 47 on
>> > >> > 8100][ERROR][org.apache.hadoop.hbase.master.HMaster]: Region server
>> > >> >
>> > ^@^@aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776
>> > >> > reported a      fatal error:
>> > >> > 336 ABORTING region server
>> > >> >
>> aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776:
>> > >> IOE
>> > >> > in log roller
>> > >> > 337 Cause:
>> > >> > 338 java.io.IOException: Exception in createWriter
>> > >> > 339         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogFileSystem.createWriter(HLogFileSystem.java:66)
>> > >> > 340         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:694)
>> > >> > 341         at
>> > >> >
>> > org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:631)
>> > >> > 342         at
>> > >> >
>> org.apache.hadoop.hbase.regionserver.LogRoller.run(LogRoller.java:94)
>> > >> > 343         at java.lang.Thread.run(Thread.java:662)
>> > >> > 344 Caused by: java.io.IOException: cannot get log writer
>> > >> > 345         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:750)
>> > >> > 346         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLogFileSystem.createWriter(HLogFileSystem.java:60)
>> > >> > 347         ... 4 more
>> > >> > 348 Caused by: java.io.IOException: java.io.FileNotFoundException:
>> > >> Parent
>> > >> > directory doesn't exist:
>> > >> >
>> > >> >
>> > >>
>> >
>> /user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776
>> > >> > 349         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.verifyParentDir(FSNamesystem.java:1726)
>> > >> > 350         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1848)
>> > >> > 351         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:1770)
>> > >> > 352         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1747)
>> > >> > 353         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:418)
>> > >> > 354         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:205)
>> > >> > 355         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44068)
>> > >> > 356         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> > >> > 357         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
>> > >> > 358         at
>> > >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
>> > >> > 359         at
>> > >> org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
>> > >> > 360         at java.security.AccessController.doPrivileged(Native
>> > >> Method)
>> > >> > 361         at javax.security.auth.Subject.doAs(Subject.java:396)
>> > >> > 362         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > >> > 363         at
>> > >> org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
>> > >> > 364
>> > >> > 365         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:173)
>> > >> > 366         at
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:747)
>> > >> > 367         ... 5 more
>> > >> > {code}
>> > >> >
>> > >> > namenode mkdirs for "/<hbase-root>/.logs/<regionserver>"
requested
>> > from
>> > >> > principal "hbase/<fqdn>@LAB" , but remove the "/<hbase-root>/.logs"
>> > >> later
>> > >> > by requested from "hdfs/<fqdn>@LAB" !
>> > >> > {code:title=namenode.log (with hdfs audit enabled)}
>> > >> > 624 2014-02-18 01:08:38,203 INFO
>> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
>> > allowed=true
>> > >> > ugi=hbase/aws-scottm-tmh6-1.ap-southeast-1.compute.internal@LAB
>> > >> > (auth:KERBEROS)
>> > >> > ip=/
>> > >> > 10.147.1.168              cmd=mkdirs
>> > >> >
>> > >> >
>> > >>
>> >
>> src=/user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776
>> > >> > dst=null        perm=hbase:supergroup:rwxr-xr-x
>> > >> > ...
>> > >> > 1068 2014-02-18 01:17:31,189 INFO
>> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
>> > allowed=true
>> > >> > ugi=hdfs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal@LAB
>> > >> > (auth:KERBEROS)
>> > >> >  ip=/
>> > >> > 10.147.1.168              cmd=delete      src=/user/SPN-hbase/.logs
>> > >> > dst=null        perm=null
>> > >> > ...
>> > >> > 1343 2014-02-18 01:19:10,132 ERROR
>> > >> > org.apache.hadoop.security.UserGroupInformation:
>> > >> PriviledgedActionException
>> > >> > as:hbase/aws-scottm-tmh6-1.ap-southeast-1.compute.internal@LAB
>> > >> > (auth:KERBEROS)
>> > >> > cause:org.apache
>> > >> > .hadoop.hdfs.server.namenode.LeaseExpiredException: No lease on
>> > >> >
>> > >> >
>> > >>
>> >
>> /user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776/aws-scottm-tmh6-1.ap-southeast-1.compute.inter
>> > >> > nal%2C8120%2C1392703714776.1392703718212 File does not exist.
>> Holder
>> > >> > DFSClient_NONMAPREDUCE_-2013292460_25 does not have any open
>> files.
>> > >> > 1344 2014-02-18 01:19:10,132 INFO org.apache.hadoop.ipc.Server:
IPC
>> > >> Server
>> > >> > handler 83 on 8020, call
>> > >> > org.apache.hadoop.hdfs.protocol.ClientProtocol.addBlock from
>> > >> > 10.147.1.168:35703: error: org.apache.hadoop.
>> > >> > hdfs.server.namenode.LeaseExpiredException: No lease on
>> > >> >
>> > >> >
>> > >>
>> >
>> /user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776/aws-scottm-tmh6-1.ap-southeast-1.compute.internal%2C81
>> > >> > 20%2C1392703714776.1392703718212 File does not exist. Holder
>> > >> > DFSClient_NONMAPREDUCE_-2013292460_25 does not have any open
>> files.
>> > >> > 1345 org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException:
>> No
>> > >> lease
>> > >> > on
>> > >> >
>> > >> >
>> > >>
>> >
>> /user/SPN-hbase/.logs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal,8120,1392703714776/
>> > >> > aws-scottm-tmh6-1.ap-southeast-1.com
>> > >> > pute.internal%2C8120%2C1392703714776.1392703718212 File does not
>> > exist.
>> > >> > Holder DFSClient_NONMAPREDUCE_-2013292460_25 does not have any
>> open
>> > >> files.
>> > >> > 1346         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2419)
>> > >> > 1347         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2410)
>> > >> > 1348         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2203)
>> > >> > 1349         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:480)
>> > >> > 1350         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:297)
>> > >> > 1351         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:44080)
>> > >> > 1352         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:453)
>> > >> > 1353         at
>> org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1002)
>> > >> > 1354         at
>> > >> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1695)
>> > >> > 1355         at
>> > >> > org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1691)
>> > >> > 1356         at java.security.AccessController.doPrivileged(Native
>> > >> Method)
>> > >> > 1357         at javax.security.auth.Subject.doAs(Subject.java:396)
>> > >> > 1358         at
>> > >> >
>> > >> >
>> > >>
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
>> > >> > 1359         at
>> > >> org.apache.hadoop.ipc.Server$Handler.run(Server.java:1689)
>> > >> > {code}
>> > >> >
>> > >> > Way to reproduce:
>> > >> > Currently, we have a way to reproduce this issue, which is running
>> > ycsb
>> > >> > workload[a-f] to HBase, and the other hand, we simply to execute
>> the
>> > >> > service <hadoop,hbase-daemons> start. Then we keep to running
10
>> times
>> > >> in a
>> > >> > loop on both side for one round, it could usually be reproduced
on
>> > >> first or
>> > >> > second round. But I am not sure whether it is related to the issue
>> in
>> > >> the
>> > >> > beginning of this discussion thread talked about.
>> > >> >
>> > >> >
>> > >> > Follow-ups:
>> > >> > Now I am following the weird msg appeared in the namenode.log
>> > >> > {code}
>> > >> > 2014-02-18 01:17:31,189 INFO
>> > >> > org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit:
>> > allowed=true
>> > >> > ugi=hdfs/aws-scottm-tmh6-1.ap-southeast-1.compute.internal@LAB
>> > >> > (auth:KERBEROS)
>> > >> >  ip=/
>> > >> > 10.147.1.168              cmd=delete      src=/user/SPN-hbase/.logs
>> > >> > dst=null        perm=null
>> > >> > {code}
>> > >> > To see what can I dig out more...
>> > >> >
>> > >> >
>> > >> > I'll keep to update this thread if any finding, and welcome any
>> > opinion,
>> > >> > tks
>> > >> >
>> > >> >
>> > >> > Best regards
>> > >> >
>> > >> > takeshi
>> > >> >
>> > >>
>> > >
>> > >
>> >
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message