hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "GAO Rui (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-10201) Implement undo log in parity datanode for hflush operations
Date Wed, 27 Apr 2016 01:32:12 GMT

    [ https://issues.apache.org/jira/browse/HDFS-10201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15259341#comment-15259341
] 

GAO Rui commented on HDFS-10201:
--------------------------------

[~liuml07], I have attached the demo patch of the new UndoLog design we have discussed last
week. The point is keeping the undo log related handling inside of FlushUndoLogManager, and
for the specific undo log of each respective internal parity block file, both the writer and
reader access the undo log via the same instance of FlushUndoLog. By doing this way, the reader
and writer could suffer from minimum affects, almost both writing and reading undo log is
handled in datanode side, and nearly be transparent to writer and reader.
 

> Implement undo log in parity datanode for hflush operations
> -----------------------------------------------------------
>
>                 Key: HDFS-10201
>                 URL: https://issues.apache.org/jira/browse/HDFS-10201
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: erasure-coding
>            Reporter: Mingliang Liu
>            Assignee: Mingliang Liu
>         Attachments: HDFS-10201-demo.patch
>
>
> According to the current design doc for hflush support in erasure coding (see [HDFS-7661]),
the parity datanode (DN) needs an undo log for flush operations. After hflush/hsync, the last
cell will be overwritten when 1) the current strip is full, 2) the file is closed, 3) or the
hflush/hsync is called again for the current non-full stripe. To serve new reader client and
to tolerate failures between successful hflush/hsync and overwrite operation, the parity DN
should preserve the old cell in the undo log before overwriting it.
> As parities correspond to block group (BG) length and parity data of different BG length
may have the same block length, the undo log should also save the respective block group (BG)
length information for the flushed data.
> This jira is to track the effort of designing and implementing an undo log in parity
DN to support hflush/hsync operations.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message