hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-6450) Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block
Date Wed, 16 Dec 2009 23:59:18 GMT
Enhance FSDataOutputStream to allow retrieving the current number of replicas of current block
----------------------------------------------------------------------------------------------

                 Key: HADOOP-6450
                 URL: https://issues.apache.org/jira/browse/HADOOP-6450
             Project: Hadoop Common
          Issue Type: Improvement
          Components: fs
            Reporter: dhruba borthakur
            Assignee: dhruba borthakur


The current HDFS implementation has the limitation that it does not replicate the last partial
block of a file when it is being written into until the file is closed. There are some long
running applications (e.g. HBase) which writes transactions logs into HDFS. If datanode(s)
in the write pipeline dies off, the application has no knowledge of it until all the datanode(s)
fail and the application gets an IO error.

These applictions would benefit a lot if they can determine the number of live replicas of
the current block to which it is writing data. For example, the application can decide that
when one of the datanode in the write pipeline fails it will close the file and start writing
to  a new file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message