Return-Path: Delivered-To: apmail-hadoop-common-issues-archive@minotaur.apache.org Received: (qmail 89299 invoked from network); 12 Oct 2009 17:21:07 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Oct 2009 17:21:07 -0000 Received: (qmail 31992 invoked by uid 500); 12 Oct 2009 17:21:07 -0000 Delivered-To: apmail-hadoop-common-issues-archive@hadoop.apache.org Received: (qmail 31945 invoked by uid 500); 12 Oct 2009 17:21:07 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-issues@hadoop.apache.org Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 31935 invoked by uid 99); 12 Oct 2009 17:21:06 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Oct 2009 17:21:06 +0000 X-ASF-Spam-Status: No, hits=-1996.5 required=10.0 tests=ALL_TRUSTED,URIBL_BLACK X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Oct 2009 17:20:52 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 50E2B234C045 for ; Mon, 12 Oct 2009 10:20:31 -0700 (PDT) Message-ID: <310416628.1255368031315.JavaMail.jira@brutus> Date: Mon, 12 Oct 2009 10:20:31 -0700 (PDT) From: "Tsz Wo (Nicholas), SZE (JIRA)" To: common-issues@hadoop.apache.org Subject: [jira] Commented: (HADOOP-6307) Support reading on un-closed SequenceFile In-Reply-To: <1155449625.1255127551381.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764752#action_12764752 ] Tsz Wo (Nicholas), SZE commented on HADOOP-6307: ------------------------------------------------ > Isn't it true that fs.getFileStatus(file).getLen(). requires read access on the parent directory whereas fs.open(file).available() required read access on the file itself? Actually, fs.getFileStatus(file).getLen() requires only "x" access on the parent directory but not "r". SequenceFile.Reader opens the file for read. So we must have (and already have) "InputStream in = fs.open(file);" in the codes. My previous suggestion was to call "in.available()" to get the number of available bytes. If we replace "fs.open(file).available()" by "in.available()", it indeed reduces an RPC to the NameNode and does not introduce any additional overhead. (However, it currently does not work because of HDFS-691.) FYR, copied the related code segment which includes all SequenceFile.Reader constructors below. {code} //line 1438, SequenceFile.java /** Open the named file. */ public Reader(FileSystem fs, Path file, Configuration conf) throws IOException { this(fs, file, conf.getInt("io.file.buffer.size", 4096), conf, false); } private Reader(FileSystem fs, Path file, int bufferSize, Configuration conf, boolean tempReader) throws IOException { this(fs, file, bufferSize, 0, fs.getFileStatus(file).getLen(), conf, tempReader); } private Reader(FileSystem fs, Path file, int bufferSize, long start, long length, Configuration conf, boolean tempReader) throws IOException { this.file = file; this.in = openFile(fs, file, bufferSize, length); this.conf = conf; boolean succeeded = false; try { seek(start); this.end = in.getPos() + length; init(tempReader); succeeded = true; } finally { if (!succeeded) { IOUtils.cleanup(LOG, in); } } } /** * Override this method to specialize the type of * {@link FSDataInputStream} returned. */ protected FSDataInputStream openFile(FileSystem fs, Path file, int bufferSize, long length) throws IOException { return fs.open(file, bufferSize); } {code} > Support reading on un-closed SequenceFile > ----------------------------------------- > > Key: HADOOP-6307 > URL: https://issues.apache.org/jira/browse/HADOOP-6307 > Project: Hadoop Common > Issue Type: Improvement > Components: io > Reporter: Tsz Wo (Nicholas), SZE > > When a SequenceFile.Reader is constructed, it calls fs.getFileStatus(file).getLen(). However, fs.getFileStatus(file).getLen() does not return the hflushed length for un-closed file since the Namenode does not know the hflushed length. DFSClient have to ask a datanode for the length last block which is being written; see also HDFS-570. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.