hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arpit Agarwal (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-11435) NameNode should track open for write files lengths more frequent than on newer block allocations
Date Tue, 21 Feb 2017 23:03:44 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15876934#comment-15876934
] 

Arpit Agarwal edited comment on HDFS-11435 at 2/21/17 11:03 PM:
----------------------------------------------------------------

bq. Yiqun Lin, Yes, so far the thinking is on the lines of Jing Zhao proposal of enhancing
the heartbeat protocol to let NameNode know about openforwrite file lengths. 
-Hi [~manojg], do you have a pointer to this proposal/discussion?-
Never mind, you were probably referring to this comment.
https://issues.apache.org/jira/browse/HDFS-11402?focusedCommentId=15872739&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15872739


was (Author: arpitagarwal):
bq. Yiqun Lin, Yes, so far the thinking is on the lines of Jing Zhao proposal of enhancing
the heartbeat protocol to let NameNode know about openforwrite file lengths. 
-Hi [~manojg], do you have a pointer to this proposal/discussion?-
Never mind, you were probably referring to this comment.
https://issues.apache.org/jira/secure/EditComment!default.jspa?id=13041860&commentId=15872739

> NameNode should track open for write files lengths more frequent than on newer block
allocations
> ------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-11435
>                 URL: https://issues.apache.org/jira/browse/HDFS-11435
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Manoj Govindassamy
>            Assignee: Manoj Govindassamy
>
> *Problem:*
> Currently the length of an open for write / Under construction file is updated on the
NameNode only when 
> # Block boundary: On block boundaries and upon allocation of new Block, NameNode gets
to know the file growth and the file length catches up
> # hsync(SyncFlag.UPDATE_LENGTH): Upon Client apps invoking a hsync on the write stream
with a special flag, DataNodes send an incremental block report with the latest file length
which NameNode uses it to update its meta data.
> # First hflush() on the new Block: Upon Client apps doing first time hflush() on an every
new Block, DataNodes notifies NameNode about the latest file length.
> # Output stream close: Forces DataNodes update NameNode about the file length after data
persistence and proper acknowledgements in the pipeline.
> So, lengths for open for write files are usually a lot less than the length seen by the
DN/client. Highly preferred to have NameNode not lagging in file lengths by order of Block
size for under construction files and to have more frequent, scalable update mechanism for
these open file lengths. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message