hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6307) Support reading on un-closed SequenceFile
Date Sun, 11 Oct 2009 09:54:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764444#action_12764444

dhruba borthakur commented on HADOOP-6307:

Isn't it true that fs.getFileStatus(file).getLen(). requires read access on the parent directory
whereas fs.open(file).available() required read access on the file itself?

Many map-reduce programs use SequenceFiles to store data. And they do not need the facility
to process files that are currently being written to. In this case, isn't the additional overhead
of fetching block locations via fs.open(file)  kinda wasteful? 

> Support reading on un-closed SequenceFile
> -----------------------------------------
>                 Key: HADOOP-6307
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6307
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>            Reporter: Tsz Wo (Nicholas), SZE
> When a SequenceFile.Reader is constructed, it calls fs.getFileStatus(file).getLen().
 However, fs.getFileStatus(file).getLen() does not return the hflushed length for un-closed
file since the Namenode does not know the hflushed length.  DFSClient have to ask a datanode
for the length last block which is being written; see also HDFS-570.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message