hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "GAO Rui (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7661) Support read when a EC file is being written
Date Thu, 10 Dec 2015 03:24:10 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049962#comment-15049962
] 

GAO Rui commented on HDFS-7661:
-------------------------------

[~szetszwo], [~jingzhao], thank you very much for the enlightening discussion in the video
meeting. I have walked through EC file reading part source codes. 

In DFSInputStream#getFileLength():
{code}
 public long getFileLength() {
  synchronized(infoLock) {
    return locatedBlocks == null? 0:
        locatedBlocks.getFileLength() + lastBlockBeingWrittenLength;
  }
}
{code}

I have three questions.
The first one, for a being written EC file,  we should make {{locatedBlocks.getFileLength()}}
cover to the last completed block group, right? 

The second questions about {{lastBlockBeingWrittenLength}}. 
I think for EC files, {{lastBlockBeingWrittenLength}} should be incremented to the last completed
written stripe. By completed written stripe(in R-S-6-3), I refer to the stripe which has all
internal cells(6 data cells and 3 parity cells) written. According to the current writing
part code. StripedDataStreamer wait for acks when a stripe has all internal data cells full
and parity cells calculated. So, it is OK to keep incrementing {{lastBlockBeingWrittenLength}}
to the last completed written strip. Does it make sense to you?

The last question is about updating {{lastBlockBeingWrittenLength}} when hflush/hsync is invoked.
I would upload an document and try to cover all possible scenarios in the document.

 I have tried to trace {{lastBlockBeingWrittenLength}}, and found out that we get the value
of {{lastBlockBeingWrittenLength}} form the datanode side by ReplicaBeingWritten#getVisibleLength():
{code}
@Override
 public long getVisibleLength() {
   return getBytesAcked(); // all acked bytes are visible
 } 
{code}

For EC files, it’s not appropriate to just take BytesAcked as visible length, in the scenarios
with flush/sync involved. I would over ride this method in the document, too.


> Support read when a EC file is being written
> --------------------------------------------
>
>                 Key: HDFS-7661
>                 URL: https://issues.apache.org/jira/browse/HDFS-7661
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: GAO Rui
>         Attachments: EC-file-flush-and-sync-steps-plan-2015-12-01.png, HDFS-7661-unitTest-wip-trunk.patch
>
>
> We also need to support hflush/hsync and visible length. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message