Return-Path: Delivered-To: apmail-hadoop-core-dev-archive@www.apache.org Received: (qmail 48762 invoked from network); 26 Jan 2009 22:14:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 26 Jan 2009 22:14:28 -0000 Received: (qmail 27728 invoked by uid 500); 26 Jan 2009 22:14:21 -0000 Delivered-To: apmail-hadoop-core-dev-archive@hadoop.apache.org Received: (qmail 27685 invoked by uid 500); 26 Jan 2009 22:14:21 -0000 Mailing-List: contact core-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: core-dev@hadoop.apache.org Delivered-To: mailing list core-dev@hadoop.apache.org Received: (qmail 27674 invoked by uid 99); 26 Jan 2009 22:14:21 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jan 2009 14:14:21 -0800 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Jan 2009 22:14:20 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 565DF234C4B4 for ; Mon, 26 Jan 2009 14:14:00 -0800 (PST) Message-ID: <574634133.1233008040352.JavaMail.jira@brutus> Date: Mon, 26 Jan 2009 14:14:00 -0800 (PST) From: "Doug Judd (JIRA)" To: core-dev@hadoop.apache.org Subject: [jira] Commented: (HADOOP-4379) In HDFS, sync() not yet guarantees data available to the new readers In-Reply-To: <901820613.1223514224222.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667462#action_12667462 ] Doug Judd commented on HADOOP-4379: ----------------------------------- I tried the test again and still no luck. To recap, here's how the log file is created: out_stream.write(header, 0, 7); out_stream.sync() out_stream.write(data, 0, amount); out_stream.sync() [...] After the test finished, I shut down the Hypertable servers. This time the listing shows the files to be 0 bytes in length (as opposed to 7 bytes with the previous patch): [doug@motherlode000 aol-basic]$ hadoop fs -ls /hypertable/servers/10.0.30.1*_38060/log/range_txn -rw-r--r-- 3 doug supergroup 0 2009-01-26 11:40 /hypertable/servers/10.0.30.102_38060/log/range_txn/0.log -rw-r--r-- 3 doug supergroup 0 2009-01-26 11:40 /hypertable/servers/10.0.30.104_38060/log/range_txn/0.log -rw-r--r-- 3 doug supergroup 0 2009-01-26 11:40 /hypertable/servers/10.0.30.106_38060/log/range_txn/0.log -rw-r--r-- 3 doug supergroup 0 2009-01-26 11:40 /hypertable/servers/10.0.30.108_38060/log/range_txn/0.log -rw-r--r-- 3 doug supergroup 0 2009-01-26 11:40 /hypertable/servers/10.0.30.110_38060/log/range_txn/0.log -rw-r--r-- 3 doug supergroup 0 2009-01-26 11:40 /hypertable/servers/10.0.30.112_38060/log/range_txn/0.log -rw-r--r-- 3 doug supergroup 0 2009-01-26 11:40 /hypertable/servers/10.0.30.114_38060/log/range_txn/0.log -rw-r--r-- 3 doug supergroup 0 2009-01-26 11:40 /hypertable/servers/10.0.30.116_38060/log/range_txn/0.log When the RangeServer starts up again, it discovers that the log file (range_txn/0.log) does exist, so it starts the recovery process. However, it only sees the 7 byte header. All of the subsequent log appends do not appear in the log file. So the system starts up without recovering any of the data. BTW, in this particular circumstance, there no other writer writing to the file when the range server comes up and reads it. Here's the high-level of what's going on: RangeServer opens an FSDataOutputStream to the log and starts appending to it RangeServer is killed with 'kill -9" RangeServer comes up again and reads the log In your above note you said, "A reader checks to see if the file is being written to by another writer. if so, it fetches the size of the last block from the primary datanode." This is not the case with our test, there is no writer writing to the log when we try to read it. - Doug > In HDFS, sync() not yet guarantees data available to the new readers > -------------------------------------------------------------------- > > Key: HADOOP-4379 > URL: https://issues.apache.org/jira/browse/HADOOP-4379 > Project: Hadoop Core > Issue Type: New Feature > Components: dfs > Reporter: Tsz Wo (Nicholas), SZE > Assignee: dhruba borthakur > Fix For: 0.19.1 > > Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, fsyncConcurrentReaders3.patch, Reader.java, Writer.java > > > In the append design doc (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it says > * A reader is guaranteed to be able to read data that was 'flushed' before the reader opened the file > However, this feature is not yet implemented. Note that the operation 'flushed' is now called "sync". -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.