hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Baldeschwieler <eri...@yahoo-inc.com>
Subject Re: [jira] Commented: (HADOOP-286) copyFromLocal throws LeaseExpiredException
Date Fri, 21 Jul 2006 08:52:09 GMT
humm...

but this needs to be addressed for speculative execution anyway.  So  
this argument doesn't apply to well designed code.

This doesn't prove we do need to make the change...

On Jul 20, 2006, at 3:22 PM, Konstantin Shvachko wrote:

> The problem with increasing the lease period is that in case of a  
> task failure the task will retry,
> and start creating the file it needs to create from scratch, which  
> won't be possible since the file
> under this name is still locked. So the task will need to wait 5  
> minutes instead of 1 to start the
> retry (HADOOP-157). This is a slow down for map/reduce.
>
>
> Eric Baldeschwieler wrote:
>
>> Why not significantly extend the lease period as well, to say 5   
>> minutes and have well behaved clients release the lease explicitly  
>> as  soon as they can?
>>
>> Clients could then try to renew starting at say 2.5 minutes and  
>> try  every 30 seconds til 4.5 minutes have expired...
>>
>> Seems like this would reduce overhead and have zero cost, since  
>> in  general there is no conflict for these leases, right?
>>
>> On Jul 18, 2006, at 6:23 PM, Konstantin Shvachko (JIRA) wrote:
>>
>>>     [ http://issues.apache.org/jira/browse/HADOOP-286?  
>>> page=comments#action_12422012 ]
>>>
>>> Konstantin Shvachko commented on HADOOP-286:
>>> --------------------------------------------
>>>
>>> It looks like the following scenario leads to this exception.
>>> LEASE_PERIOD = 60 sec is a global constants defining for how long  
>>> a  lease is issued.
>>> DFSClient.LeaseChecker renews this client leases every 30 sec =   
>>> LEASE_PERIOD/2.
>>> If the renewLease() fails then the client retries to renew every   
>>> second.
>>> One of the most popular reasons the renewLease() fails is  
>>> because  it timeouts SocketTimeoutException.
>>> This happens when the namenode is busy, which is not unusual  
>>> since  we lock it for each operation.
>>> The socket timeout is defined by the config parameter   
>>> "ipc.client.timeout", which is set to 60 sec in
>>> hadoop-default.xml That means that the renewLease() can last up  
>>> to  60 seconds and the lease will
>>> expire the next time the client tries to renew it, which could  
>>> be  up to 90 seconds after the lease was
>>> created or renewed last time.
>>> So there are 2 simple solutions to the problem:
>>> 1) to increase LEASE_PERIOD
>>> 2) to decrease ipc.client.timeout
>>>
>>> A related problem is that DFSClient sends lease renew requests  
>>> no  matter what every 30 seconds
>>> or less. It looks like the DFSClient has enough information to  
>>> send  renew messages only if it really
>>> holds a lease. A simple solution would be avoid calling   
>>> renewLease () when
>>> DFSClient.pendingCreates is empty.
>>> This could substantially decrease overall net traffic for map/ 
>>> reduce.
>>>
>>>
>>>
>>>> copyFromLocal throws LeaseExpiredException
>>>> ------------------------------------------
>>>>
>>>>                 Key: HADOOP-286
>>>>                 URL: http://issues.apache.org/jira/browse/ 
>>>> HADOOP-286
>>>>             Project: Hadoop
>>>>          Issue Type: Bug
>>>>          Components: dfs
>>>>    Affects Versions: 0.3.0
>>>>         Environment: redhar linux
>>>>            Reporter: Runping Qi
>>>>
>>>> Loading local files to dfs through hadoop dfs -copyFromLocal   
>>>> failed due to the following exception:
>>>> copyFromLocal: org.apache.hadoop.dfs.LeaseExpiredException: No   
>>>> lease on output_crawled.1.txt
>>>>         at org.apache.hadoop.dfs.FSNamesystem.getAdditionalBlock  
>>>> (FSNamesystem.java:414)
>>>>         at org.apache.hadoop.dfs.NameNode.addBlock(NameNode.java: 
>>>> 190)
>>>>         at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown   
>>>> Source)
>>>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke  
>>>> (DelegatingMethodAccessorImpl.java:25)
>>>>         at java.lang.reflect.Method.invoke(Method.java:585)
>>>>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:243)
>>>>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java: 
>>>> 231)
>>>
>>>
>>> -- 
>>> This message is automatically generated by JIRA.
>>> -
>>> If you think it was sent incorrectly contact one of the   
>>> administrators: http://issues.apache.org/jira/secure/  
>>> Administrators.jspa
>>> -
>>> For more information on JIRA, see: http://www.atlassian.com/  
>>> software/jira
>>>
>>>
>>
>>
>>
>


Mime
View raw message