hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mingliang Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7661) Erasure coding: support hflush and hsync
Date Thu, 25 Feb 2016 02:39:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15166613#comment-15166613
] 

Mingliang Liu commented on HDFS-7661:
-------------------------------------

Thanks for you comments, [~drankye].

1. Augmenting the crc file, i.e. meta file, is possible. However, it becomes too complicated
if we interleave the checksum and BG length records. If we place them in two segments of the
.meta file as | header | crc | bglen records|, the CRC section should be preserved, which
leads to holes in the file.
Meanwhile, the {{.bglen}} file is treated as a redo/undo log whose records are to:
  * indicate the state of parity block data file (i.e. last cell): complete or incomplete.
Incomplete means partial parity cell.
  * rollback last cell to previous healthy data if the state is incomplete. If the last cell
is being overwritten, we need  rollback to the state before overwrite happens; or else, the
last cell is simply abandoned.
We don't need these records for original data block. I'll update the design doc in detail
to show how can we rollback safely using the {{bglen}} records.

2. I totally agree we should document {{offsetInBlock, packetLen, blockGroupLen}} definition
and why we need them in the first place. Based on offline discussion with [~demongaorui] yesterday,
we're refining the design doc with more detailed design motivations, which will show the challenging
scenarios and why we need advanced techniques to address them. [~demongaorui] and I will share
the design doc later this week. I appreciate your further review and comments.

3. The intension of the example was that we should not make any assumption about the packet
size and cell size, but not assuming that they're naturally different. The fact is that they
could be different and not aligned. Actually the current default size is not aligned, i.e.
the packet data size is 63 KB and the cell size is 64 KB (just as the example showed). The
cell size is EC policy dependent while we have different constraints on packet data size,
refer to [HDFS-7308]. The best we can do is to forcefully make them aligned, in which case
we still need to deal with scenarios that one cell may need multiple transmission packets
or one packet contains multiple cells.

Ping [~demongaorui]] for discussion.

> Erasure coding: support hflush and hsync
> ----------------------------------------
>
>                 Key: HDFS-7661
>                 URL: https://issues.apache.org/jira/browse/HDFS-7661
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: GAO Rui
>         Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, HDFS-7661-unitTest-wip-trunk.patch,
HDFS-7661-wip.01.patch, HDFS-EC-file-flush-sync-design-version1.1.pdf, HDFS-EC-file-flush-sync-design-version2.0.pdf
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message