hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-6307) Support reading on un-closed SequenceFile
Date Mon, 12 Oct 2009 17:20:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764752#action_12764752
] 

Tsz Wo (Nicholas), SZE commented on HADOOP-6307:
------------------------------------------------

> Isn't it true that fs.getFileStatus(file).getLen(). requires read access on the parent
directory whereas fs.open(file).available() required read access on the file itself?

Actually, fs.getFileStatus(file).getLen() requires only "x" access on the parent directory
but not "r".

SequenceFile.Reader opens the file for read.  So we must have (and already have) "InputStream
in =  fs.open(file);" in the codes.  My previous suggestion was to call "in.available()" to
get the number of available bytes.  If we replace "fs.open(file).available()" by "in.available()",
it indeed reduces an RPC to the NameNode and does not introduce any additional overhead. 
(However, it currently does not work because of HDFS-691.)

FYR, copied the related code segment which includes all SequenceFile.Reader constructors below.
{code}
//line 1438, SequenceFile.java
    /** Open the named file. */
    public Reader(FileSystem fs, Path file, Configuration conf)
      throws IOException {
      this(fs, file, conf.getInt("io.file.buffer.size", 4096), conf, false);
    }

    private Reader(FileSystem fs, Path file, int bufferSize,
                   Configuration conf, boolean tempReader) throws IOException {
      this(fs, file, bufferSize, 0, fs.getFileStatus(file).getLen(), conf, tempReader);
    }
    
    private Reader(FileSystem fs, Path file, int bufferSize, long start,
                   long length, Configuration conf, boolean tempReader) 
    throws IOException {
      this.file = file;
      this.in = openFile(fs, file, bufferSize, length);
      this.conf = conf;
      boolean succeeded = false;
      try {
        seek(start);
        this.end = in.getPos() + length;
        init(tempReader);
        succeeded = true;
      } finally {
        if (!succeeded) {
          IOUtils.cleanup(LOG, in);
        }
      }
    }

    /**
     * Override this method to specialize the type of
     * {@link FSDataInputStream} returned.
     */
    protected FSDataInputStream openFile(FileSystem fs, Path file,
        int bufferSize, long length) throws IOException {
      return fs.open(file, bufferSize);
    }
{code}

> Support reading on un-closed SequenceFile
> -----------------------------------------
>
>                 Key: HADOOP-6307
>                 URL: https://issues.apache.org/jira/browse/HADOOP-6307
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: io
>            Reporter: Tsz Wo (Nicholas), SZE
>
> When a SequenceFile.Reader is constructed, it calls fs.getFileStatus(file).getLen().
 However, fs.getFileStatus(file).getLen() does not return the hflushed length for un-closed
file since the Namenode does not know the hflushed length.  DFSClient have to ask a datanode
for the length last block which is being written; see also HDFS-570.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message