hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mehul Sutariya <mehulgsutar...@gmail.com>
Subject Re: LeaseExpiredException Exception
Date Sat, 12 Dec 2009 01:46:23 GMT
Hey Jason,

I use Hadoop 0.20.1 and I had seen the lease expired exception RecordWriter
was closed manually, which means I had my customized OutputFormat. So, after
closing the writer, framework tries to close the writer as well and fails.
My best guess here is that somewhere in your job, you are closing the writer
yourself rather than allowing the framework to do so.

Mehul.

On Tue, Dec 8, 2009 at 11:43 AM, Ken Krugler <kkrugler_lists@transpac.com>wrote:

> Hi Jason,
> Hi Jason,
>
> Thanks for the info - it's good to hear from somebody else who's run into
> this :)
>
> I tried again with a bigger box for the master, and wound up with the same
> results.
>
> I guess the framework could be killing it - but no idea why. This is during
> a very simple "write out the results" phase, so very high I/O but not much
> computation, and nothing should be hung.
>
> Any particular configuration values you had to tweak? I'm running this in
> Elastic MapReduce (EMR) so most settings are whatever they provide by
> default. I override a few things in my JobConf, but (for example) anything
> related to HDFS/MR framework will be locked & loaded by the time my job is
> executing.
>
> Thanks!
>
> -- Ken
>
>
> On Dec 8, 2009, at 9:34am, Jason Venner wrote:
>
>  Is it possible that this is occurring in a task that is being killed by
>> the
>> framework.
>> Sometimes there is a little lag, between the time the tracker 'kills a
>> task'
>> and the task fully dies, you could be getting into a situation like that
>> where the task is in the process of dying but the last write is still in
>> progress.
>> I see this situation happen when the task tracker machine is heavily
>> loaded.
>> In once case there was a 15 minute lag between the timestamp in the
>> tracker
>> for killing task XYZ, and the task actually going away.
>>
>> It took me a while to work this out as I had to merge the tracker and task
>> logs by time to actually see the pattern.
>> The host machines where under very heavy io pressure, and may have been
>> paging also. The code and configuration issues that triggered this have
>> been
>> resolved, so I don't see it anymore.
>>
>> On Tue, Dec 8, 2009 at 8:32 AM, Ken Krugler <kkrugler_lists@transpac.com
>> >wrote:
>>
>>  Hi all,
>>>
>>> In searching the mail/web archives, I see occasionally questions from
>>> people (like me) who run into the LeaseExpiredException (in my case, on
>>> 0.18.3 while running a 50 server cluster in EMR).
>>>
>>> Unfortunately I don't see any responses, other than Dennis Kubes saying
>>> that he thought some work had been done in this area of Hadoop "a while
>>> back". And this was in 2007, so it hopefully doesn't apply to my
>>> situation.
>>>
>>> java.io.IOException: Stream closed.
>>>      at
>>>
>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.isClosed(DFSClient.java:2245)
>>>      at
>>>
>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.writeChunk(DFSClient.java:2481)
>>>      at
>>>
>>> org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunk(FSOutputSummer.java:155)
>>>      at
>>> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:132)
>>>      at
>>> org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:121)
>>>      at
>>> org.apache.hadoop.fs.FSOutputSummer.write1(FSOutputSummer.java:112)
>>>      at org.apache.hadoop.fs.FSOutputSummer.write(FSOutputSummer.java:86)
>>>      at
>>>
>>> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:49)
>>>      at java.io.DataOutputStream.write(DataOutputStream.java:90)
>>>      at
>>>
>>> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.writeBuffer(SequenceFile.java:1260)
>>>      at
>>>
>>> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.sync(SequenceFile.java:1277)
>>>      at
>>>
>>> org.apache.hadoop.io.SequenceFile$BlockCompressWriter.close(SequenceFile.java:1295)
>>>      at
>>>
>>> org.apache.hadoop.mapred.SequenceFileOutputFormat$1.close(SequenceFileOutputFormat.java:73)
>>>      at
>>>
>>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.close(MapTask.java:276)
>>>      at org.apache.hadoop.mapred.MapTask.run(MapTask.java:238)
>>>      at
>>> org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2216)
>>>
>>> This issue seemed related, but would have been fixed in the 0.18.3
>>> release.
>>>
>>> http://issues.apache.org/jira/browse/HADOOP-3760
>>>
>>> I saw a similar HBase issue -
>>> https://issues.apache.org/jira/browse/HBASE-529 - but they "fixed" it by
>>> retrying a failure case.
>>>
>>> These exceptions occur during "write storms", where lots of files are
>>> being
>>> written out. Though "lots" is relative, e.g. 10-20M.
>>>
>>> It's repeatable, in that it fails on the same step of a series of chained
>>> MR jobs.
>>>
>>> Is it possible I need to be running a bigger box for my namenode server?
>>> Any other ideas?
>>>
>>> Thanks,
>>>
>>> -- Ken
>>>
>>>
>>> On May 25, 2009, at 7:37am, Stas Oskin wrote:
>>>
>>> Hi.
>>>
>>>>
>>>> I have a process that writes to file on DFS from time to time, using
>>>> OutputStream.
>>>> After some time of writing, I'm starting getting the exception below,
>>>> and
>>>> the write fails. The DFSClient retries several times, and then fails.
>>>>
>>>> Copying the file from local disk to DFS via CopyLocalFile() works fine.
>>>>
>>>> Can anyone advice on the matter?
>>>>
>>>> I'm using Hadoop 0.18.3.
>>>>
>>>> Thanks in advance.
>>>>
>>>>
>>>> 09/05/25 15:35:35 INFO dfs.DFSClient:
>>>> org.apache.hadoop.ipc.RemoteException:
>>>> org.apache.hadoop.dfs.LeaseExpiredException: No lease on /test/test.bin
>>>> File
>>>> does not exist. Holder DFSClient_-951664265 does not have any open
>>>> files.
>>>>
>>>>         at
>>>> org.apache.hadoop.dfs.FSNamesystem.checkLease(FSNamesystem.java:1172)
>>>>
>>>>         at
>>>>
>>>>
>>>> org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1103
>>>> )
>>>>
>>>>         at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java:330)
>>>>
>>>>         at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source)
>>>>
>>>>         at
>>>>
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
>>>> )
>>>>
>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>
>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:481)
>>>>
>>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:890)
>>>>
>>>>
>>>>
>>>>         at org.apache.hadoop.ipc.Client.call(Client.java:716)
>>>>
>>>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>>>>
>>>>         at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>>>>
>>>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>>>
>>>>         at
>>>>
>>>>
>>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39
>>>> )
>>>>
>>>>         at
>>>>
>>>>
>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25
>>>> )
>>>>
>>>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>>>
>>>>         at
>>>>
>>>>
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82
>>>> )
>>>>
>>>>         at
>>>>
>>>>
>>>> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59
>>>> )
>>>>
>>>>         at org.apache.hadoop.dfs.$Proxy0.addBlock(Unknown Source)
>>>>
>>>>         at
>>>>
>>>>
>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:2450
>>>> )
>>>>
>>>>         at
>>>>
>>>>
>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2333
>>>> )
>>>>
>>>>         at
>>>>
>>>>
>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream.access$1800(DFSClient.java:1745
>>>> )
>>>>
>>>>         at
>>>>
>>>>
>>>> org.apache.hadoop.dfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:1922
>>>> )
>>>>
>>>>
>>>>  --------------------------------------------
>>> Ken Krugler
>>> +1 530-210-6378
>>> http://bixolabs.com
>>> e l a s t i c   w e b   m i n i n g
>>>
>>>
>>>
>>>
>>>
>>>
>>> --------------------------------------------
>>> Ken Krugler
>>> +1 530-210-6378
>>> http://bixolabs.com
>>> e l a s t i c   w e b   m i n i n g
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
>> http://www.amazon.com/dp/1430219424?tag=jewlerymall
>> www.prohadoopbook.com a community for Hadoop Professionals
>>
>
> --------------------------------------------
> Ken Krugler
> +1 530-210-6378
> http://bixolabs.com
> e l a s t i c   w e b   m i n i n g
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message