Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id BDC0D97FE for ; Tue, 17 Apr 2012 15:28:44 +0000 (UTC) Received: (qmail 28048 invoked by uid 500); 17 Apr 2012 15:28:44 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 27987 invoked by uid 500); 17 Apr 2012 15:28:44 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 27827 invoked by uid 99); 17 Apr 2012 15:28:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Apr 2012 15:28:44 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Apr 2012 15:28:39 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 94FA639E4BE for ; Tue, 17 Apr 2012 15:28:19 +0000 (UTC) Date: Tue, 17 Apr 2012 15:28:19 +0000 (UTC) From: "ramkrishna.s.vasudevan (Commented) (JIRA)" To: issues@hbase.apache.org Message-ID: <547602125.33237.1334676499628.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1054145780.42025.1331259477068.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-5545) region can't be opened for a long time. Because the creating File failed. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/HBASE-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13255642#comment-13255642 ] ramkrishna.s.vasudevan commented on HBASE-5545: ----------------------------------------------- @Lars Do you mind taking this into 0.94.0? Though it exists in previous versions!! > region can't be opened for a long time. Because the creating File failed. > ------------------------------------------------------------------------- > > Key: HBASE-5545 > URL: https://issues.apache.org/jira/browse/HBASE-5545 > Project: HBase > Issue Type: Bug > Components: regionserver > Affects Versions: 0.90.6 > Reporter: gaojinchao > Assignee: gaojinchao > Fix For: 0.90.7, 0.92.2, 0.94.1 > > > Scenario: > ------------ > 1. File is created > 2. But while writing data, all datanodes might have crashed. So writing data will fail. > 3. Now even if close is called in finally block, close also will fail and throw the Exception because writing data failed. > 4. After this if RS try to create the same file again, then AlreadyBeingCreatedException will come. > Suggestion to handle this scenario. > --------------------------- > 1. Check for the existence of the file, if exists delete the file and create new file. > Here delete call for the file will not check whether the file is open or closed. > Overwrite Option: > ---------------- > 1. Overwrite option will be applicable if you are trying to overwrite a closed file. > 2. If the file is not closed, then even with overwrite option Same AlreadyBeingCreatedException will be thrown. > This is the expected behaviour to avoid the Multiple clients writing to same file. > Region server logs: > org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: failed to create file /hbase/test1/12c01902324218d14b17a5880f24f64b/.tmp/.regioninfo for DFSClient_hb_rs_158-1-131-48,20020,1331107668635_1331107669061_-252463556_25 on client 158.1.132.19 because current leaseholder is trying to recreate file. > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:1570) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1440) > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1382) > at org.apache.hadoop.hdfs.server.namenode.NameNode.create(NameNode.java:658) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:547) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1137) > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1133) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1131) > at org.apache.hadoop.ipc.Client.call(Client.java:961) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:245) > at $Proxy6.create(Unknown Source) > at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at $Proxy6.create(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.(DFSClient.java:3643) > at org.apache.hadoop.hdfs.DFSClient.create(DFSClient.java:778) > at org.apache.hadoop.hdfs.DistributedFileSystem.create(DistributedFileSystem.java:364) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:630) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:611) > at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:518) > at org.apache.hadoop.hbase.regionserver.HRegion.checkRegioninfoOnFilesystem(HRegion.java:424) > at org.apache.hadoop.hbase.regionserver.HRegion.initialize(HRegion.java:340) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2672) > at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:2658) > at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:330) > at org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:116) > at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:158) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > [2012-03-07 20:51:45,858] [WARN ] [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-23] [com.huawei.isap.ump.ha.client.RPCRetryAndSwitchInvoker 131] Retrying the method call: public abstract void org.apache.hadoop.hdfs.protocol.ClientProtocol.create(java.lang.String,org.apache.hadoop.fs.permission.FsPermission,java.lang.String,boolean,boolean,short,long) throws java.io.IOException with arguments of length: 7. The exisiting ActiveServerConnection is: > ActiveServerConnectionInfo: > Metadata:158-1-131-48/158.1.132.19:9000 > Version:145720623220907 > [2012-03-07 20:51:45,872] [DEBUG] [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-20] [org.apache.hadoop.hbase.zookeeper.ZKAssign 849] regionserver:20020-0x135ec32b39e0002-0x135ec32b39e0002 Successfully transitioned node 91bf3e6f8adb2e4b335f061036353126 from M_ZK_REGION_OFFLINE to RS_ZK_REGION_OPENING > [2012-03-07 20:51:45,873] [DEBUG] [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-20] [org.apache.hadoop.hbase.regionserver.HRegion 2649] Opening region: REGION => {NAME => 'test1,00088613810,1331112369360.91bf3e6f8adb2e4b335f061036353126.', STARTKEY => '00088613810', ENDKEY => '00088613815', ENCODED => 91bf3e6f8adb2e4b335f061036353126, TABLE => {{NAME => 'test1', FAMILIES => [{NAME => 'value', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'GZ', TTL => '86400', BLOCKSIZE => '655360', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} > [2012-03-07 20:51:45,873] [DEBUG] [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-20] [org.apache.hadoop.hbase.regionserver.HRegion 316] Instantiated test1,00088613810,1331112369360.91bf3e6f8adb2e4b335f061036353126. > [2012-03-07 20:51:45,874] [ERROR] [RS_OPEN_REGION-158-1-131-48,20020,1331107668635-20] [ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira