hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gaurav Kanade (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12508) delete fails with exception when lease is held on blob
Date Mon, 26 Oct 2015 17:00:28 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974543#comment-14974543

Gaurav Kanade commented on HADOOP-12508:

Hey [~cnauroth]

Thanks for the review! The  acquireLease makes a call to obtain a SelfRenewingLease. If you
look at the SelfRenewingLease class it will keep waiting to acquire lease until it finally
acquires it. The concurrent process that holds the lease is dead - so once its existing hold
on the lease expires it will not self renew (as the self-renewing thread will be dead as well).
This patch seeks to address this particular condition. Thus the expected behavior would be
the process that died while still holding a dangling lease will not attempt to renew. The
process will keep on trying to acquire the lease until it gets it which will be once the existing
lease expires (The default lease holding time is 60 sec, so it will be a worst case test for
60 sec). At a bare minimum this patch will not break anything that is already broken, and
it will expose a deeper issue if exists.

As for the testing, we are working on designing a framework that can test error conditions
caused by concurrent processes exiting unexpectedly (this seems to be the class of issues
we are hitting and are exposed by the new HBase test introduced in HDP 2.3 - these seem to
be rarely occurring in practice as no customer seems to have hit them yet). In the meanwhile
if you have ideas regarding the kind of testing that can be done quickly in the short term
would love to hear those.

> delete fails with exception when lease is held on blob
> ------------------------------------------------------
>                 Key: HADOOP-12508
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12508
>             Project: Hadoop Common
>          Issue Type: Bug
>            Reporter: Gaurav Kanade
>            Assignee: Gaurav Kanade
>            Priority: Blocker
>         Attachments: HADOOP-12508.01.patch, HADOOP-12508.02.patch
> The delete function as implemented by AzureNativeFileSystem store attempts delete without
a lease. In most cases this works but in the case of a dangling lease resulting out of say
a process killed and leaving a lease dangling for a small period a delete attempted during
this period simply crashes. This fix addresses the situation by re-attempting the delete after
a lease acqusition in this case

This message was sent by Atlassian JIRA

View raw message