hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10267) Extra "synchronized" on FsDatasetImpl#recoverAppend and FsDatasetImpl#recoverClose
Date Thu, 07 Apr 2016 00:37:25 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15229424#comment-15229424
] 

Colin Patrick McCabe commented on HDFS-10267:
---------------------------------------------

Basically the test, at the high level, is something like this:

1. create the {{slowWriterThread}} thread and make it the {{Writer}} for {{recoveringBlock}},
by calling {{FsDatasetImpl#createRbw}}.  Basically {{FsDatasetImpl}} grabs the {{Thread}}
object and stores it in {{ReplicaInPipeline}}.
2. create the {{stopWriterThread}} thread and have it call some operation that will call {{ReplicaInPipe#stopWriter}}
on {{recovingBlock}}.  This sends an INE (InterruptedException) to {{SlowWriterThread}}
3. {{slowWriterThread}} receives the {{InterruptedException}}, and sets an {{AtomicBoolean}}.
 But it doesn't exit, meaning that {{stopWriterThread}} will hang.
4. meanwhile, the main thread waits to see the AtomicBoolean set by step #3
5. the main thread calls some operation on {{FsDatasetImpl}} that needs to take the lock.
 If {{stopWriterThread}} failed to drop the lock when calling {{stopWriter}}, the test will
deadlock here and we will get our timeout.  Otherwise, the test succeeds.
6. main thread tells {{slowWriterThread}} to exit, then joins all threads.  main thread ensures
that no thread exited in a dirty way

> Extra "synchronized" on FsDatasetImpl#recoverAppend and FsDatasetImpl#recoverClose
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-10267
>                 URL: https://issues.apache.org/jira/browse/HDFS-10267
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.8.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>         Attachments: HDFS-10267.001.patch, HDFS-10267.002.patch, HDFS-10267.003.patch,
HDFS-10267.004.patch
>
>
> There is an extra "synchronized" on FsDatasetImpl#recoverAppend and FsDatasetImpl#recoverClose
that prevents the HDFS-8496 fix from working as intended.  This should be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message