Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A742B18482 for ; Wed, 10 Jun 2015 06:16:02 +0000 (UTC) Received: (qmail 13702 invoked by uid 500); 10 Jun 2015 06:16:02 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 13639 invoked by uid 500); 10 Jun 2015 06:16:02 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 13624 invoked by uid 99); 10 Jun 2015 06:16:01 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Jun 2015 06:16:01 +0000 Date: Wed, 10 Jun 2015 06:16:01 +0000 (UTC) From: "Kevin Beyer (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-196) File length not reported correctly after application crash MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14580084#comment-14580084 ] Kevin Beyer commented on HDFS-196: ---------------------------------- I've learned about soft and hard limits on the update lease. After the hard limit expired, the file length corrected to same number of bytes found by reading. So this is not a bug. However, I have a few ideas that might help: 1. The file stats could update when the soft limit expires. This would reduce the window of inconsistency to 1 minute instead of 1 hour. 2. Allow the writing application to control the "safe" length and limit readers to the safe length. Readers could set an option to read the unsafe bytes (or the default could read the full length, but that seems more dangerous although backwards compatible to current behavior). If the lease is not recovered before the hard limit expires, the unsafe bytes are discarded (a writer option could control this as well). This would allow applications to avoid partial record reads. 3. A simple way for readers to detect that there is an active soft/hard lease on a file, probably in the FileStatus. 4. The hard limit duration should be an option when opening for write. The default should be zero. 5. A simple way to terminate a hard lease. > File length not reported correctly after application crash > ---------------------------------------------------------- > > Key: HDFS-196 > URL: https://issues.apache.org/jira/browse/HDFS-196 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: Doug Judd > > Our application (Hypertable) creates a transaction log in HDFS. This log is written with the following pattern: > out_stream.write(header, 0, 7); > out_stream.sync() > out_stream.write(data, 0, amount); > out_stream.sync() > [...] > However, if the application crashes and then comes back up again, the following statement > length = mFilesystem.getFileStatus(new Path(fileName)).getLen(); > returns the wrong length. Apparently this is because this method fetches length information from the NameNode which is stale. Ideally, a call to getFileStatus() would return the accurate file length by fetching the size of the last block from the primary datanode. -- This message was sent by Atlassian JIRA (v6.3.4#6332)