hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Owen O'Malley (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-563) DFS client should try to re-new lease if it gets a lease expiration exception when it adds a block to a file
Date Wed, 27 Sep 2006 23:07:51 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-563?page=comments#action_12438246 ] 
Owen O'Malley commented on HADOOP-563:

One minute is awfully short to lose your lease that kills a day worth of work. However, if
we make the leases longer that will interact badly with a replacement reduce tasks starting.
One approach that might be reasonable is to have two time limits:

lease becomes losable: 1 minute
lease is lost: 1 hour

A losable lease is lost when someone tries to create the same file. We need to have a forced
timeout to handle the case of clients that disappear where the filename is never written again.

You need to separate out the handling of losable/lost leases on the namenode because once
the lease is  declared lost on the name node, the blocks will be deleted. 

> DFS client should try to re-new lease if it gets a lease expiration exception when it
adds a block to a file
> ------------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-563
>                 URL: http://issues.apache.org/jira/browse/HADOOP-563
>             Project: Hadoop
>          Issue Type: Bug
>            Reporter: Runping Qi
> In the current DFS client implementation, there is one thread responsible for renewing
leases. If for whatever reason, that thread runs behind, the lease may get expired. That causes
the client gets a lease expiration exception when writing a block. The consequence of that
is very devastating: the client can no longer write to the file, and all the partial results
up to that point are gone! This is especially costly for some map reduce jobs where a reducer
may take hours or even days to sort the intermediate results before the actual reducing work
can start.
> The problem will be solved if the flush method of  DFS client can renew lease on demand.
That is, it should try to re-new lease  when it catches a lease expiration exception. That
way,  even when under heavy load and the lease renewing thread runs behind, the reducer  task
(or what ever tasks use that client) can preceed.  That will be a huge saving in some cases
(where sorting intermediate results take a long time to finish). We can set a limit on the
number of retries, and may even make it configurable (or changeable at runtime). 
> The namenode can use a different expiration time that is much higher than the current
1 minute lease expiration time for cleaning  up the abandoned unclosed files.

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message