hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Hsieh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6633) Support reading new data in a being written file until the file is closed
Date Thu, 10 Jul 2014 08:08:05 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14057253#comment-14057253

Jonathan Hsieh commented on HDFS-6633:

Here's another for hbase: HBase Replication uses log shipping to and usually the "user level
tail" today.  "User level tail" periodically closes and reopens a file to get the new file
length.  This imposes extra uncessary operations on the name node and also incurs extra latency
(since we don't want to do this too frequently).  Having a hdfs tail would allow the replication
mechanism read data and ship it more frequently and more cheaply (fewer nn ops and would more
quickly get updated when new data arrives).

> Support reading new data in a being written file until the file is closed
> -------------------------------------------------------------------------
>                 Key: HDFS-6633
>                 URL: https://issues.apache.org/jira/browse/HDFS-6633
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>          Components: hdfs-client
>            Reporter: Tsz Wo Nicholas Sze
>            Assignee: Tsz Wo Nicholas Sze
>         Attachments: h6633_20140707.patch, h6633_20140708.patch
> When a file is being written, the file length keeps increasing.  If the file is opened
for read, the reader first gets the file length and then read only up to that length.  The
reader will not be able to read the new data written afterward.
> We propose adding a new feature so that readers will be able to read all the data until
the writer closes the file.

This message was sent by Atlassian JIRA

View raw message