hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Nauroth <cnaur...@hortonworks.com>
Subject Re: HDFS short-circuit tokens expiring
Date Wed, 20 Jan 2016 19:01:49 GMT
Hi Nick,

This is something that the client does recover from internally.  I expect the exception would
not propagate to the caller, and therefore the reading process would continue successfully.
 There is a potential for a latency hit on the extra NameNode RPC required for recovery.

On opening a file, the client issues an RPC to the NameNode to determine block locations for
reading the blocks from DataNodes.  In a secured cluster, this RPC response also includes
a block access token for each block, which the client needs for subsequent authorization to
the DataNodes hosting replicas of those blocks.  In typical configuration, this information
(block locations and access tokens) is pre-fetched for up to 10 blocks, possibly capped by
file size of course.  Block access tokens have an expiration.  Let's say for example that
the application does not read all the way through to the tenth block within the block access
token expiration.  At that point, the client would recover internally by issuing another RPC
to the NameNode to fetch a fresh block access token with a new expiration.  Then, the read
would continue as normal.

Does this explanation jive with what you're seeing?  Are the reads successful, but experiencing
undesired latency (the "response too slow" warning you mentioned)?  If so, then maybe this
hints at a feature request for HDFS.  For example, maybe it would be desirable to detect "almost
expired" block access tokens in a background thread and refetch them proactively.  That way,
the caller wouldn't absorb the latency hit of the extra RPC as part of its read call.  Please
feel free to file an HDFS JIRA if this makes sense, or if you have something else like it
in mind.

--Chris Nauroth

From: Dejan Menges <dejan.menges@gmail.com<mailto:dejan.menges@gmail.com>>
Date: Wednesday, January 20, 2016 at 3:26 AM
To: Nick Dimiduk <ndimiduk@apache.org<mailto:ndimiduk@apache.org>>, "user@hadoop.apache.org<mailto:user@hadoop.apache.org>"
<user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Cc: hbase-user <user@hbase.apache.org<mailto:user@hbase.apache.org>>
Subject: Re: HDFS short-circuit tokens expiring

Hi Nick,

I had exactly the same case, and in our case it was that tokens were expiring too quickly.
What we increased was dfs.client.read.shortcircuit.streams.cache.size and dfs.client.read.shortcircuit.streams.cache.expiry.ms<http://dfs.client.read.shortcircuit.streams.cache.expiry.ms>.

Hope this helps.

Best,
Dejan

On Wed, Jan 20, 2016 at 12:15 AM Nick Dimiduk <ndimiduk@apache.org<mailto:ndimiduk@apache.org>>
wrote:
Hi folks,

This looks like it sits in the intersection between HDFS and HBase. My region server logs
are flooding with messages like "SecretManager$InvalidToken: access control error while attempting
to set up short-circuit access to <hdfs-file-path> ... is expired" [0].

These logs correspond with responseTooSlow WARNings from the region server.

Maybe I have misconfigured short-circuit reads? Such an expiration seems like something the
client or client consumer should handle re-negotiations.

Thanks a lot,
-n

[0]

2016-01-19 22:10:14,432 INFO  [B.defaultRpcServer.handler=4,queue=1,port=16020] shortcircuit.ShortCircuitCache:
ShortCircuitCache(0x71bdc547): could not load 1074037633_BP-1145309065-XXX-1448053136416 due
to InvalidToken exception.
org.apache.hadoop.security.token.SecretManager$InvalidToken: access control error while attempting
to set up short-circuit access to <hfile path> token with block_token_identifier (expiryDate=1453194430724,
keyId=1508822027, userId=hbase, blockPoolId=BP-1145309065-XXX-1448053136416, blockId=1074037633,
access modes=[READ]) is expired.
at org.apache.hadoop.hdfs.BlockReaderFactory.requestFileDescriptors(BlockReaderFactory.java:591)
at org.apache.hadoop.hdfs.BlockReaderFactory.createShortCircuitReplicaInfo(BlockReaderFactory.java:490)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.create(ShortCircuitCache.java:782)
at org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache.fetchOrCreate(ShortCircuitCache.java:716)
at org.apache.hadoop.hdfs.BlockReaderFactory.getBlockReaderLocal(BlockReaderFactory.java:422)
at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:333)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:618)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:844)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:896)
at java.io.DataInputStream.read(DataInputStream.java:149)
at org.apache.hadoop.hbase.io.hfile.HFileBlock.readWithExtra(HFileBlock.java:678)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$AbstractFSReader.readAtOffset(HFileBlock.java:1372)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockDataInternal(HFileBlock.java:1591)
at org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.readBlockData(HFileBlock.java:1470)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:437)
at org.apache.hadoop.hbase.io.hfile.HFileBlockIndex$BlockIndexReader.loadDataBlockWithScanInfo(HFileBlockIndex.java:259)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.seekTo(HFileReaderV2.java:634)
at org.apache.hadoop.hbase.io.hfile.HFileReaderV2$AbstractScannerV2.reseekTo(HFileReaderV2.java:614)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseekAtOrAfter(StoreFileScanner.java:267)
at org.apache.hadoop.hbase.regionserver.StoreFileScanner.reseek(StoreFileScanner.java:181)
at org.apache.hadoop.hbase.regionserver.NonLazyKeyValueScanner.doRealSeek(NonLazyKeyValueScanner.java:55)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.generalizedSeek(KeyValueHeap.java:312)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.reseek(KeyValueHeap.java:256)
at org.apache.hadoop.hbase.regionserver.StoreScanner.reseek(StoreScanner.java:817)
at org.apache.hadoop.hbase.regionserver.StoreScanner.seekToNextRow(StoreScanner.java:792)
at org.apache.hadoop.hbase.regionserver.StoreScanner.next(StoreScanner.java:621)
at org.apache.hadoop.hbase.regionserver.KeyValueHeap.next(KeyValueHeap.java:147)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.populateResult(HRegion.java:5486)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextInternal(HRegion.java:5637)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5424)
at org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.nextRaw(HRegion.java:5410)
at org.apache.phoenix.coprocessor.GroupedAggregateRegionObserver$1.next(GroupedAggregateRegionObserver.java:510)
at org.apache.phoenix.coprocessor.BaseRegionScanner.next(BaseRegionScanner.java:40)
at org.apache.phoenix.coprocessor.BaseRegionScanner.nextRaw(BaseRegionScanner.java:60)
at org.apache.phoenix.coprocessor.DelegateRegionScanner.nextRaw(DelegateRegionScanner.java:77)
at org.apache.hadoop.hbase.regionserver.RSRpcServices.scan(RSRpcServices.java:2395)
at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32205)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
at java.lang.Thread.run(Thread.java:745)


Mime
View raw message