hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "GAO Rui (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7661) Erasure coding: support hflush and hsync
Date Mon, 22 Feb 2016 09:04:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15156653#comment-15156653

GAO Rui commented on HDFS-7661:

[~drankye], I have been doing some investigation about how hbase using hflush/hsync. According
to this [slider | http://www.slideshare.net/enissoz/hbase-and-hdfs-understanding-filesystem-usage]
shared by [~enis] : In page 12 and 15, for hbase, {{hflush()}} is used for writing WAL(Write
Ahead Logs), WAL sync( hflush() ) hundreds of times per sec.  So, I think create new bg for
each flush call is not an practical option then.  

Maybe we could continue to discuss the previous option way:
I was planing to truncate both the overwritten data in the end of both data file and .meta
file in parity datanode, then store the overwritten data in the end of .meta file. One possible
way to keep the data before first flush safe even if the second flush fails, maybe we could
add {{upgrade/rollback}} mechanism of {{DataStorage}} alike method to data/checksum file of
parity datanodes.

Though, if the dns failure cause writing process to fail. We can not guarantee the data safety
before first flush. Even in Replica file, we flush at some time and then continue to write
file to 3dns. If we flush again in the same block, the write process is failed by dns failures,
we either could not guarantee the data safety before the first flush I think.  [~walter.k.su],
is this make sense to you? 

Based on {{upgrade/rollback}} mechanism of data/checksum file of parity datanodes, we could
recovery data before the first flush only in scenarios like bellow:
  1.first flush successes 
  2.parity dn0 dies 
  3.data dn4,dn5 and parity dn1 failed during second flush, but parity dn2 success
At, this time of point, if parity dn0 comes back, we could roll back dn2 to the status before
second flush. 
This might be the only kind of scenario using  {{upgrade/rollback}} mechanism of data/checksum
file of parity datanodes.
Guys, do we need to implement  {{upgrade/rollback}} mechanism for this kind of scenarios?

[~liuml07], [~jingzhao], for the data consistency issue. If we do not implement a lock in
NN, maybe we could make the read client to check bg data length in the .meta files of 3 parity
dns to check if they are in the same version like [~zhz] suggested. But, if the read client
find the bg data lengths are different, the read client could try to read the .mata file again
against the less bg data length parity dns.  But, the bg data length could change several
times, maybe the read client could not get a consistence bg data length all the time.   Am
I missing something?

> Erasure coding: support hflush and hsync
> ----------------------------------------
>                 Key: HDFS-7661
>                 URL: https://issues.apache.org/jira/browse/HDFS-7661
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: GAO Rui
>         Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, HDFS-7661-unitTest-wip-trunk.patch,
HDFS-7661-wip.01.patch, HDFS-EC-file-flush-sync-design-version1.1.pdf, HDFS-EC-file-flush-sync-design-version2.0.pdf
> We also need to support hflush/hsync and visible length. 

This message was sent by Atlassian JIRA

View raw message