hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daryn Sharp (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-12639) BPOfferService lock may stall all service actors
Date Thu, 12 Oct 2017 15:00:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-12639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16202067#comment-16202067

Daryn Sharp commented on HDFS-12639:

Sure, go ahead and assign to yourself.  If I don't assign to myself unless I have free cycles.
 Even though my responses are often delayed, please let me review your patch.

> BPOfferService lock may stall all service actors
> ------------------------------------------------
>                 Key: HDFS-12639
>                 URL: https://issues.apache.org/jira/browse/HDFS-12639
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.8.0
>            Reporter: Daryn Sharp
> {{BPOfferService}} manages {{BPServiceActor}} instances for the active and standby. 
It uses a RW lock to primarily protect registration information while determining the active/standby
from heartbeats.
> Unfortunately the write lock is held during command processing.  If an actor is experiencing
high latency processing commands, the other actor will neither be able to register (blocked
in createRegistration, setNamespaceInfo, verifyAndSetNamespaceInfo) nor process heartbeats
(blocked in updateActorStatesFromHeartbeat).
> The worst case scenario for processing commands while holding the lock is re-registration.
 The actor will loop, catching and logging exceptions, leaving the other actor blocked for
an non-deterministic (possibly infinite) amount of time.
> The lock must not be held during command processing.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message