hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manoj Govindassamy (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10830) FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being removed is in use
Date Fri, 02 Sep 2016 21:25:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15459633#comment-15459633
] 

Manoj Govindassamy commented on HDFS-10830:
-------------------------------------------

[~arpitagarwal],

{quote}
  void waitVolumeRemoved(int sleepMillis, Condition condition) {
      .. .. ..
      try {
        condition.wait(sleepMillis);  <==
      } catch (InterruptedException e) {
        FsDatasetImpl.LOG.info("Thread interrupted when waiting for "
            + "volume reference to be released.");
        Thread.currentThread().interrupt();
      }
{quote}

# By calling Object monitor method wait() on {{Condition}} variable, it will be treated like
a normal Object and will try to use its intrinsic monitor as against the Condition's wait-set
instance (here {{datasetLockCondition}}). Copying below the note I read from java doc on Condition.

[Ref|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/locks/Condition.html]
{color:blue}
??Note that Condition instances are just normal objects and can themselves be used as the
target in a synchronized statement, and can have their own monitor wait and notification methods
invoked. Acquiring the monitor lock of a Condition instance, or using its monitor methods,
has no specified relationship with acquiring the Lock associated with that Condition or the
use of its waiting and signalling methods. It is recommended that to avoid confusion you never
use Condition instances in this way, except perhaps within their own implementation.??
{color}

# So, the usage of condition.wait() here expects the Object monitor to be locked using synchronized()
and since it is not locked, we will again get IllegalMonitorStateException.  Please let me
know if my understanding is wrong.

> FsDatasetImpl#removeVolumes() crashes with IllegalMonitorStateException when vol being
removed is in use
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-10830
>                 URL: https://issues.apache.org/jira/browse/HDFS-10830
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Manoj Govindassamy
>            Assignee: Arpit Agarwal
>         Attachments: HDFS-10830.01.patch
>
>
> {{FsDatasetImpl#removeVolumes()}} operation crashes abruptly with IllegalMonitorStateException
whenever the volume being removed is in use concurrently.
> Looks like {{removeVolumes()}} is waiting on a monitor object "this" (that is FsDatasetImpl)
which it has never locked, leading to  IllegalMonitorStateException. This monitor wait happens
only the volume being removed is in use (referencecount > 0). The thread performing this
remove volume operation thus crashes abruptly and block invalidations for the remove volumes
are totally skipped. 
> {code:title=FsDatasetImpl.java|borderStyle=solid}
> @Override
> public void removeVolumes(Set<File> volumesToRemove, boolean clearFailure) {
> ..
> ..
> try (AutoCloseableLock lock = datasetLock.acquire()) {   <== LOCK acquire datasetLock
> for (int idx = 0; idx < dataStorage.getNumStorageDirs(); idx++) {
>   .. .. ..
>   asyncDiskService.removeVolume(sd.getCurrentDir());     <== volume SD1 remove
>   volumes.removeVolume(absRoot, clearFailure);
>   volumes.waitVolumeRemoved(5000, this);                 <== WAIT on "this" ?? But,
we haven't locked it yet.
>                                                              This will cause IllegalMonitorStateException
>                                                              and crash getBlockReports()/FBR
thread!
>   for (String bpid : volumeMap.getBlockPoolList()) {
>     List<ReplicaInfo> blocks = new ArrayList<>();
>     for (Iterator<ReplicaInfo> it = volumeMap.replicas(bpid).iterator();
>          it.hasNext(); ) {
>         .. .. .. 
>         it.remove();                                     <== volumeMap removal
>       }
>     blkToInvalidate.put(bpid, blocks);
>   }
>  .. ..
> }                                                        <== LOCK release datasetLock
  
> // Call this outside the lock.
> for (Map.Entry<String, List<ReplicaInfo>> entry :
> blkToInvalidate.entrySet()) {
>  ..
>  for (ReplicaInfo block : blocks) {
>   invalidate(bpid, block);                               <== Notify NN of Block removal
>  }
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message