Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 77777 invoked from network); 19 Oct 2007 19:17:44 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 19 Oct 2007 19:17:44 -0000 Received: (qmail 27540 invoked by uid 500); 19 Oct 2007 19:17:31 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 27187 invoked by uid 500); 19 Oct 2007 19:17:30 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 27178 invoked by uid 99); 19 Oct 2007 19:17:30 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Oct 2007 12:17:30 -0700 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 19 Oct 2007 19:17:41 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 2E2EF714238 for ; Fri, 19 Oct 2007 12:16:51 -0700 (PDT) Message-ID: <7908961.1192821411186.JavaMail.jira@brutus> Date: Fri, 19 Oct 2007 12:16:51 -0700 (PDT) From: "Konstantin Shvachko (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-2073) Datanode corruption if machine dies while writing VERSION file In-Reply-To: <413500.1192667570995.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko updated HADOOP-2073: ---------------------------------------- Attachment: versionFileSize.patch Killing data-nodes immediately after they started turned out to be a good crash test. Thanks Michael. I am attaching a patch that changes file size after writing the data rather than before. That way VERSION never gets emptied. This will solve current Michael's problem. In general, I agree with Raghu we should check our code for inconsistencies the file system state can get into as a result of different crash scenarios. I think this patch should go into 0.15 > Datanode corruption if machine dies while writing VERSION file > -------------------------------------------------------------- > > Key: HADOOP-2073 > URL: https://issues.apache.org/jira/browse/HADOOP-2073 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.14.0 > Reporter: Michael Bieniosek > Assignee: Raghu Angadi > Attachments: versionFileSize.patch > > > Yesterday, due to a bad mapreduce job, some of my machines went on OOM killing sprees and killed a bunch of datanodes, among other processes. Since my monitoring software kept trying to bring up the datanodes, only to have the kernel kill them off again, each machine's datanode was probably killed many times. A large percentage of these datanodes will not come up now, and write this message to the logs: > 2007-10-18 00:23:28,076 ERROR org.apache.hadoop.dfs.DataNode: org.apache.hadoop.dfs.InconsistentFSStateException: Directory /hadoop/dfs/data is in an inconsistent state: file VERSION is invalid. > When I check, /hadoop/dfs/data/current/VERSION is an empty file. Consequently, I have to delete all the blocks on the datanode and start over. Since the OOM killing sprees happened simultaneously on several datanodes in my DFS cluster, this could have crippled my dfs cluster. > I checked the hadoop code, and in org.apache.hadoop.dfs.Storage, I see this: > {{{ > /** > * Write version file. > * > * @throws IOException > */ > void write() throws IOException { > corruptPreUpgradeStorage(root); > write(getVersionFile()); > } > void write(File to) throws IOException { > Properties props = new Properties(); > setFields(props, this); > RandomAccessFile file = new RandomAccessFile(to, "rws"); > FileOutputStream out = null; > try { > file.setLength(0); > file.seek(0); > out = new FileOutputStream(file.getFD()); > props.store(out, null); > } finally { > if (out != null) { > out.close(); > } > file.close(); > } > } > }}} > So if the datanode dies after file.setLength(0), but before props.store(out, null), the VERSION file will get trashed in the corrupted state I see. Maybe it would be better if this method created a temporary file VERSION.tmp, and then copied it to VERSION, then deleted VERSION.tmp? That way, if VERSION was detected to be corrupt, the datanode could look at VERSION.tmp to recover the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.