hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Schmidtke <ro.schmid...@gmail.com>
Subject Re: LeaseExpiredException during TestDFSIO on HDFS
Date Wed, 11 Nov 2015 08:12:42 GMT
I should add then I've been running TestDFSIO on the same hardware on
XtreemFS (a distributed file system that supports replication, striping
across nodes, locality for file splits etc., much like HDFS) using the same
configuration (32M block size, replication factor of 1, 21 files of 1G
each), and I'm not seeing any exceptions. The measured IO rates are lower
than HDFS's, however with almost no standard deviation and very consistent
running times, as well as 20 out of 21 data local placements. I'm telling
you this because I think this rules out hardware problems and it may give
you a hint about which part of the system might be at fault here.

Thanks
Robert

On Wed, Nov 11, 2015 at 9:03 AM, Robert Schmidtke <ro.schmidtke@gmail.com>
wrote:

> Hi everyone,
>
> I've been running the TestDFSIO benchmark on HDFS using the following
> setup: 8 nodes, (1 namenode with co-located resource manager, 7 data nodes
> with co-located node managers), HDFS block size of 32M, replication of 1,
> 21 files of 1G each (i.e. 3 mappers per data node). I am running TestDFSIO
> ten times in a row (as a cycle of write, read and cleanup operations), and
> in some of the runs I'm getting a LeaseExpiredException (not the first run
> though). Following is a stack trace with some context. I was hoping that
> maybe you could point me to where I might have gone wrong in my
> configuration. My HDFS config files are pretty vanilla, I am using Hadoop
> 2.7.1.
>
> ...
> 15/11/10 11:44:15 INFO mapreduce.Job: Running job: job_1447152143064_0003
> 15/11/10 11:44:21 INFO mapreduce.Job: Job job_1447152143064_0003 running
> in uber mode : false
> 15/11/10 11:44:21 INFO mapreduce.Job:  map 0% reduce 0%
> 15/11/10 11:44:27 INFO mapreduce.Job:  map 5% reduce 0%
> 15/11/10 11:44:28 INFO mapreduce.Job:  map 38% reduce 0%
> 15/11/10 11:44:29 INFO mapreduce.Job:  map 48% reduce 0%
> 15/11/10 11:44:30 INFO mapreduce.Job:  map 57% reduce 0%
> 15/11/10 11:44:35 INFO mapreduce.Job:  map 73% reduce 0%
> 15/11/10 11:44:37 INFO mapreduce.Job:  map 86% reduce 0%
> 15/11/10 11:44:38 INFO mapreduce.Job:  map 86% reduce 19%
> 15/11/10 11:44:47 INFO mapreduce.Job: Task Id :
> attempt_1447152143064_0003_m_000008_0, Status : FAILED
> Error:
> org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException):
> No lease on /benchmarks/TestDFSIO/io_data/test_io_18 (inode 16554): File
> does not exist. Holder
> DFSClient_attempt_1447152143064_0003_m_000008_0_690388761_1 does not have
> any open files.
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3431)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.analyzeFileState(FSNamesystem.java:3236)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getNewBlockTargets(FSNamesystem.java:3074)
> at
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3034)
> at
> org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:723)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:492)
> at
> org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
>
> at org.apache.hadoop.ipc.Client.call(Client.java:1476)
> at org.apache.hadoop.ipc.Client.call(Client.java:1407)
> at
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
> at com.sun.proxy.$Proxy12.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
> at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
> at com.sun.proxy.$Proxy13.addBlock(Unknown Source)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1430)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1226)
> at
> org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)
>
> 15/11/10 11:44:48 INFO mapreduce.Job:  map 83% reduce 19%
> 15/11/10 11:44:50 INFO mapreduce.Job:  map 89% reduce 22%
> 15/11/10 11:44:51 INFO mapreduce.Job:  map 100% reduce 22%
> 15/11/10 11:44:52 INFO mapreduce.Job:  map 100% reduce 100%
> 15/11/10 11:44:53 INFO mapreduce.Job: Job job_1447152143064_0003 completed
> successfully
> 15/11/10 11:44:53 INFO mapreduce.Job: Counters: 51
> ...
>
> I am also seeing an extremely high standard deviation for the read rate
> (up to almost 100%), as well as running times for read operations (between
> 20s and 160s). The locality of the placement is also roughly only 15 out of
> 21. Could this be related to the above exception(s)? Thanks a lot in
> advance, I'm happy to supply any more information if you need it.
>
> Robert
>
> --
> My GPG Key ID: 336E2680
>



-- 
My GPG Key ID: 336E2680

Mime
View raw message