Return-Path: Delivered-To: apmail-lucene-hadoop-dev-archive@locus.apache.org Received: (qmail 1360 invoked from network); 10 Dec 2007 21:17:04 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 10 Dec 2007 21:17:04 -0000 Received: (qmail 21467 invoked by uid 500); 10 Dec 2007 21:16:52 -0000 Delivered-To: apmail-lucene-hadoop-dev-archive@lucene.apache.org Received: (qmail 21433 invoked by uid 500); 10 Dec 2007 21:16:52 -0000 Mailing-List: contact hadoop-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hadoop-dev@lucene.apache.org Delivered-To: mailing list hadoop-dev@lucene.apache.org Received: (qmail 21424 invoked by uid 99); 10 Dec 2007 21:16:52 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2007 13:16:52 -0800 X-ASF-Spam-Status: No, hits=-100.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 10 Dec 2007 21:16:39 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 19C6071422B for ; Mon, 10 Dec 2007 13:16:43 -0800 (PST) Message-ID: <16861632.1197321403095.JavaMail.jira@brutus> Date: Mon, 10 Dec 2007 13:16:43 -0800 (PST) From: "stack (JIRA)" To: hadoop-dev@lucene.apache.org Subject: [jira] Updated: (HADOOP-2283) [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits) In-Reply-To: <4116281.1196113483221.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HADOOP-2283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HADOOP-2283: -------------------------- Attachment: 2283.patch > [hbase] AlreadyBeingCreatedException (Was: Stuck replay of failed regionserver edits) > ------------------------------------------------------------------------------------- > > Key: HADOOP-2283 > URL: https://issues.apache.org/jira/browse/HADOOP-2283 > Project: Hadoop > Issue Type: Bug > Components: contrib/hbase > Reporter: stack > Assignee: stack > Priority: Minor > Fix For: 0.16.0 > > Attachments: 2283.patch, 2283.patch, compaction.patch, OP_READ.patch > > > Looking in master for a cluster of ~90 regionservers, the regionserver carrying the ROOT went down (because it hadn't talked to the master in 30 seconds). > Master notices the downed regionserver because its lease timesout. It then goes to run the shutdown server sequence only splitting the regionserver's edit log, it gets stuck trying to split the second of three log files. Eventually, after ~5minutes, the second log split throws: > 34974 2007-11-26 01:21:23,999 WARN hbase.HMaster - Processing pending operations: ProcessServerShutdown of XX.XX.XX.XX:60020 > 34975 org.apache.hadoop.dfs.AlreadyBeingCreatedException: org.apache.hadoop.dfs.AlreadyBeingCreatedException: failed to create file /hbase/hregion_-1194436719/oldlogfile.log for DFSClient_610028837 on client XX.XX.XX.XX because curren t leaseholder is trying to recreate file. > 34976 at org.apache.hadoop.dfs.FSNamesystem.startFileInternal(FSNamesystem.java:848) > 34977 at org.apache.hadoop.dfs.FSNamesystem.startFile(FSNamesystem.java:804) > 34978 at org.apache.hadoop.dfs.NameNode.create(NameNode.java:276) > 34979 at sun.reflect.GeneratedMethodAccessor13.invoke(Unknown Source) > 34980 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > 34981 at java.lang.reflect.Method.invoke(Method.java:597) > 34982 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:379) > 34983 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596) > 34984 > 34985 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) > 34986 at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > 34987 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > 34988 at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > 34989 at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:82) > 34990 at org.apache.hadoop.hbase.HMaster.run(HMaster.java:1094) > And so on every 5 minutes. > Because the regionserver that went down had ROOT region, and because we are stuck in this eternal loop, ROOT never gets reallocated. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.