hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ken Krugler <kkrugler_li...@transpac.com>
Subject Re: LeaseExpiredException Exception
Date Tue, 08 Dec 2009 19:43:35 GMT
Hi Jason,
Hi Jason,

Thanks for the info - it's good to hear from somebody else who's run  
into this :)

I tried again with a bigger box for the master, and wound up with the  
same results.

I guess the framework could be killing it - but no idea why. This is  
during a very simple "write out the results" phase, so very high I/O  
but not much computation, and nothing should be hung.

Any particular configuration values you had to tweak? I'm running this  
in Elastic MapReduce (EMR) so most settings are whatever they provide  
by default. I override a few things in my JobConf, but (for example)  
anything related to HDFS/MR framework will be locked & loaded by the  
time my job is executing.

Thanks!

-- Ken

On Dec 8, 2009, at 9:34am, Jason Venner wrote:

> Is it possible that this is occurring in a task that is being killed  
> by the
> framework.
> Sometimes there is a little lag, between the time the tracker 'kills  
> a task'
> and the task fully dies, you could be getting into a situation like  
> that
> where the task is in the process of dying but the last write is  
> still in
> progress.
> I see this situation happen when the task tracker machine is heavily  
> loaded.
> In once case there was a 15 minute lag between the timestamp in the  
> tracker
> for killing task XYZ, and the task actually going away.
>
> It took me a while to work this out as I had to merge the tracker  
> and task
> logs by time to actually see the pattern.
> The host machines where under very heavy io pressure, and may have  
> been
> paging also. The code and configuration issues that triggered this  
> have been
> resolved, so I don't see it anymore.
>
> On Tue, Dec 8, 2009 at 8:32 AM, Ken Krugler <kkrugler_lists@transpac.com 
> >wrote:
>
>> Hi all,
>>
>> In searching the mail/web archives, I see occasionally questions from
>> people (like me) who run into the LeaseExpiredException (in my  
>> case, on
>> 0.18.3 while running a 50 server cluster in EMR).
>>
>> Unfortunately I don't see any responses, other than Dennis Kubes  
>> saying
>> that he thought some work had been done in this area of Hadoop "a  
>> while
>> back". And this was in 2007, so it hopefully doesn't apply to my  
>> situation.
>>
>> I see these LeaseExpiredException errors showing up in the logs  
>> around the
>> same time as IOException errors, eg:
>>
>> java.io.IOException: Stream closed.
>>       at
>> org.apache.hadoop.dfs.DFSClient 
>> $DFSOutputStream.isClosed(DFSClient.java:2245)
>>       at
>> org.apache.hadoop.dfs.DFSClient 
>> $DFSOutputStream.writeChunk(DFSClient.java:2481)
>>       at
>> org 
>> .apache 
>> .hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
>>       at
>> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java: 
>> 132)
>>       at
>> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java: 
>> 121)
>>       at
>> org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
>>       at  
>> org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>>       at
>> org.apache.hadoop.fs.FSDataOutputStream 
>> $PositionCache.write(FSDataOutputStream.java:49)
>>       at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>       at
>> org.apache.hadoop.io.SequenceFile 
>> $BlockCompressWriter.writeBuffer(SequenceFile.java:1260)
>>       at
>> org.apache.hadoop.io.SequenceFile 
>> $BlockCompressWriter.sync(SequenceFile.java:1277)
>>       at
>> org.apache.hadoop.io.SequenceFile 
>> $BlockCompressWriter.close(SequenceFile.java:1295)
>>       at
>> org.apache.hadoop.mapred.SequenceFileOutputFormat 
>> $1.close(SequenceFileOutputFormat.java:73)
>>       at
>> org.apache.hadoop.mapred.MapTask 
>> $DirectMapOutputCollector.close(MapTask.java:276)
>>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:238)
>>       at
>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java: 
>> 2216)
>>
>> This issue seemed related, but would have been fixed in the 0.18.3  
>> release.
>>
>> http://issues.apache.org/jira/browse/HADOOP-3760
>>
>> I saw a similar HBase issue -
>> https://issues.apache.org/jira/browse/HBASE-529 - but they "fixed"  
>> it by
>> retrying a failure case.
>>
>> These exceptions occur during "write storms", where lots of files  
>> are being
>> written out. Though "lots" is relative, e.g. 10-20M.
>>
>> It's repeatable, in that it fails on the same step of a series of  
>> chained
>> MR jobs.
>>
>> Is it possible I need to be running a bigger box for my namenode  
>> server?
>> Any other ideas?
>>
>> Thanks,
>>
>> -- Ken
>>
>>
>> On May 25, 2009, at 7:37am, Stas Oskin wrote:
>>
>> Hi.
>>>
>>> I have a process that writes to file on DFS from time to time, using
>>> OutputStream.
>>> After some time of writing, I'm starting getting the exception  
>>> below, and
>>> the write fails. The DFSClient retries several times, and then  
>>> fails.
>>>
>>> Copying the file from local disk to DFS via CopyLocalFile() works  
>>> fine.
>>>
>>> Can anyone advice on the matter?
>>>
>>> I'm using Hadoop 0.18.3.
>>>
>>> Thanks in advance.
>>>
>>>
>>> 09/05/25 15:35:35 INFO dfs.DFSClient:
>>> org.apache.hadoop.ipc.RemoteException:
>>> org.apache.hadoop.dfs.LeaseExpiredException: No lease on /test/ 
>>> test.bin
>>> File
>>> does not exist. Holder DFSClient_-951664265 does not have any open  
>>> files.
>>>
>>>          at
>>> org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java: 
>>> 1172)
>>>
>>>          at
>>>
>>> org 
>>> .apache 
>>> .hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1103
>>> )
>>>
>>>          at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java: 
>>> 330)
>>>
>>>          at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown  
>>> Source)
>>>
>>>          at
>>>
>>> sun 
>>> .reflect 
>>> .DelegatingMethodAccessorImpl 
>>> .invoke(DelegatingMethodAccessorImpl.java:25
>>> )
>>>
>>>          at java.lang.reflect.Method.invoke(Method.java:597)
>>>
>>>          at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>>>
>>>          at org.apache.hadoop.ipc.Server$Handler.run(Server.java: 
>>> 890)
>>>
>>>
>>>
>>>          at org.apache.hadoop.ipc.Client.call(Client.java:716)
>>>
>>>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>>>
>>>          at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>>>
>>>          at sun.reflect.NativeMethodAccessorImpl.invoke0(Native  
>>> Method)
>>>
>>>          at
>>>
>>> sun 
>>> .reflect 
>>> .NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
>>> )
>>>
>>>          at
>>>
>>> sun 
>>> .reflect 
>>> .DelegatingMethodAccessorImpl 
>>> .invoke(DelegatingMethodAccessorImpl.java:25
>>> )
>>>
>>>          at java.lang.reflect.Method.invoke(Method.java:597)
>>>
>>>          at
>>>
>>> org 
>>> .apache 
>>> .hadoop 
>>> .io 
>>> .retry 
>>> .RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
>>> )
>>>
>>>          at
>>>
>>> org 
>>> .apache 
>>> .hadoop 
>>> .io 
>>> .retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
>>> )
>>>
>>>          at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>>>
>>>          at
>>>
>>> org.apache.hadoop.dfs.DFSClient 
>>> $DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
>>> )
>>>
>>>          at
>>>
>>> org.apache.hadoop.dfs.DFSClient 
>>> $DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
>>> )
>>>
>>>          at
>>>
>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access 
>>> $1800(DFSClient.java:1745
>>> )
>>>
>>>          at
>>>
>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream 
>>> $DataStreamer.run(DFSClient.java:1922
>>> )
>>>
>>>
>> --------------------------------------------
>> Ken Krugler
>> +1 530-210-6378
>> http://bixolabs.com
>> e l a s t i c   w e b   m i n i n g
>>
>>
>>
>>
>>
>>
>> --------------------------------------------
>> Ken Krugler
>> +1 530-210-6378
>> http://bixolabs.com
>> e l a s t i c   w e b   m i n i n g
>>
>>
>>
>>
>>
>
>
> -- 
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals

--------------------------------------------
Ken Krugler
+1 530-210-6378
http://bixolabs.com
e l a s t i c   w e b   m i n i n g





Mime
View raw message