Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 65074 invoked from network); 29 Oct 2006 22:27:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Oct 2006 22:27:28 -0000 Received: (qmail 97300 invoked by uid 500); 29 Oct 2006 22:27:38 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 97271 invoked by uid 500); 29 Oct 2006 22:27:38 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 97262 invoked by uid 99); 29 Oct 2006 22:27:38 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 29 Oct 2006 14:27:38 -0800 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 29 Oct 2006 14:27:25 -0800 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 58AD57142CD for ; Sun, 29 Oct 2006 14:26:18 -0800 (PST) Message-ID: <24661873.1162160778360.JavaMail.root@brutus> Date: Sun, 29 Oct 2006 14:26:18 -0800 (PST) From: "Milind Bhandarkar (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Commented: (HADOOP-646) name node server does not load large (> 2^31 bytes) edits file In-Reply-To: <6602315.1161900496769.JavaMail.root@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ http://issues.apache.org/jira/browse/HADOOP-646?page=comments#action_12445466 ] Milind Bhandarkar commented on HADOOP-646: ------------------------------------------ Dhruba, Finding out number of entries from the size of the edits file is not possible, since these are not fixed size entries. However, trying to read a next byte and ending when we get an EOFExxception will allow us to avoid the call to available() completely. I am attaching a patch (untested for files > 2G) which does exactly that. Christian, Can you try this patch out for the backed-up edits file (6.5 G) that you have and let me know if it worked ? (It should be safe to try this patch out if you backup your image.) - milind > name node server does not load large (> 2^31 bytes) edits file > -------------------------------------------------------------- > > Key: HADOOP-646 > URL: http://issues.apache.org/jira/browse/HADOOP-646 > Project: Hadoop > Issue Type: Bug > Components: dfs > Affects Versions: 0.7.1 > Reporter: Christian Kunz > Priority: Critical > > FileInputStream.available() returns negative values when reading a large file (> 2^31 bytes) -- this is a known (unresolved) java bug: > http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6402006 > Consequence: a large edits file is not loaded and deleted without any warnings. The system reverts back to the old fsimage. > This happens in jdk1.6 as well, i.e. the bug has not yet been fixed. > In addition, when finally I was able to load my big cron-backed-up edits file (6.5 GB) with a kludgy work-around, the blocks did not exist anymore in the data node servers, probably deleted from the previous attempts when the name node server did not know about the changed situation. > Moral till this is fixed or worked-around: don't wait too long to restart the name node server. Otherwise this is a way to lose the entire dfs. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira