hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Manjunath Anand (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-11277) Implement equals and hashcode in FsVolumeSpi implementations
Date Wed, 28 Dec 2016 10:34:58 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782603#comment-15782603
] 

Manjunath Anand edited comment on HDFS-11277 at 12/28/16 10:33 AM:
-------------------------------------------------------------------

Hi [~arpitagarwal] , can you please review this subtask and I am providing my analysis below:-

The code that I am referring to where object identity check is done in java collection is
the below code in ThrottledAsyncChecker:-
{code}
if (checksInProgress.containsKey(target)) {
      return checksInProgress.get(target);
    }

    if (completedChecks.containsKey(target)) {
{code}

Although there is no error with the above approach, I thought of a case (which may possibly
be very remote but better to handle) where in say two or more threads triggered checkAllVolumesAsync
simultaneously on the same DatasetVolumeChecker object. Now due to race conditions, both threads
got through the condition {code}(gap < minDiskCheckGapMs){code} in this method. Then during
scheduling {code} delegateChecker.schedule {code} since its synchronized , threads will add
the FsVolumeSpi implementations as targets into checksInProgress hashmap one after the other.
However because of no overriding of equals and hashcode in the FsVolumeSpi implementation
(unlike say StorageLocation), object identity matching will be used and it may so happen that
the two competing threads had different FsVolumeSpi implementation objects created but referring
to the same underlying StorageLocation or having same StorageID which will result in unintentional
scheduling of the same StorageLocation simultaneously when one is already running thus violating
minMsBetweenChecks condition check.

Please let me know your thoughts on this.


was (Author: manju_hadoop):
Hi [~arpitagarwal] , can you please review this subtask and I am providing my analysis below:-

The code that I am referring to where object identity check is done in java collection is
the below code in ThrottledAsyncChecker:-
{code}
if (checksInProgress.containsKey(target)) {
      return checksInProgress.get(target);
    }

    if (completedChecks.containsKey(target)) {
{code}

Although there is no error with the above approach, I thought of a case (which may possibly
be very remote but better to handle) where in say two or more threads triggered checkAllVolumesAsync
simultaneously on the same DatasetVolumeChecker object. Now due to race conditions, both threads
got through the condition {code}(gap < minDiskCheckGapMs){code} in this method. Then during
scheduling {code} delegateChecker.schedule {code} since its synchronized , threads will add
the FsVolumeSpi implementations as targets into checksInProgress hashmap. However because
of no overriding of equals and hashcode in the FsVolumeSpi implementation (unlike say StorageLocation),
object identity matching will be used and it may so happen that the two competing threads
had different FsVolumeSpi implementation objects created but referring to the same underlying
StorageLocation or having same StorageID which will result in unintentional scheduling of
the same StorageLocation simultaneously when one is already running thus violating minMsBetweenChecks
condition check.

Please let me know your thoughts on this.

> Implement equals and hashcode in FsVolumeSpi implementations
> ------------------------------------------------------------
>
>                 Key: HDFS-11277
>                 URL: https://issues.apache.org/jira/browse/HDFS-11277
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: datanode
>            Reporter: Manjunath Anand
>
> Certain of the implementations of FsVolumeSpi say for eg:- FsVolumeImpl can implement
equals and hashcode. This is to avoid object identity check during disk check scheduling in
ThrottledAsyncChecker and instead use other means of checking whether a diskcheck is already
in progress or not for  FsVolumeImpl object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message