Return-Path: Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: (qmail 2714 invoked from network); 7 Apr 2011 08:47:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 Apr 2011 08:47:28 -0000 Received: (qmail 31126 invoked by uid 500); 7 Apr 2011 08:47:11 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 30971 invoked by uid 500); 7 Apr 2011 08:46:51 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 30947 invoked by uid 99); 7 Apr 2011 08:46:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Apr 2011 08:46:44 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 Apr 2011 08:46:43 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id C1D1996A23 for ; Thu, 7 Apr 2011 08:46:05 +0000 (UTC) Date: Thu, 7 Apr 2011 08:46:05 +0000 (UTC) From: "gaojinchao (JIRA)" To: issues@hbase.apache.org Message-ID: <1323951284.40247.1302165965790.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <647415556.26438.1301620086193.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (HBASE-3722) A lot of data is lost when name node crashed MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016768#comment-13016768 ] gaojinchao commented on HBASE-3722: ----------------------------------- yes, it is important for me. thanks. some explains about our application: 1.I have a babysitter process, it controls all Hbase process start or stop. when NN crash. Hbase can be self-protection. when NN recover. I hope to Hbase can automatically recover service. if Hmaster don't shutdown itself, it will skipping splitlog and wait for assign Meta table or root table. when NN recover and region server start up. a lots of data is lost. especially the meta table. 2. Hbase + hadoop-append should assure all data not to be lost except hadoop is lost data. the reliability is importance for my application. I read the code about Hlog and do some DFX tests. the issue is badly. but NN crashed is lowness probability. I find Region server will also retart when NN crash. please review the modification. I afraid to make a mistake. > A lot of data is lost when name node crashed > --------------------------------------------- > > Key: HBASE-3722 > URL: https://issues.apache.org/jira/browse/HBASE-3722 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.1 > Reporter: gaojinchao > Attachments: HmasterFilesystem_PatchV1.patch > > > I'm not sure exactly what arose it. there is some split failed logs . > the master should shutdown itself when the HDFS is crashed. > The logs is : > 2011-03-22 13:21:55,056 WARN > org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the > logs > java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused > at org.apache.hadoop.ipc.Client.wrapException(Client.java:844) > at org.apache.hadoop.ipc.Client.call(Client.java:820) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221) > at $Proxy5.getListing(Unknown Source) > at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at $Proxy5.getListing(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614) > at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252) > at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121) > at org.apache.hadoop.hbase.Chore.run(Chore.java:66) > at > org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332) > at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:943) > at org.apache.hadoop.ipc.Client.call(Client.java:788) > ... 13 more > 2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s). > 2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s). > 2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s). > 2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s). > 2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s). > 2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s). > 2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s). > 2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s). > 2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s). > 2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s). > 2011-03-22 13:22:05,060 ERROR > org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting > hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398 > java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused > at org.apache.hadoop.ipc.Client.wrapException(Client.java:844) > at org.apache.hadoop.ipc.Client.call(Client.java:820) > at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221) > at $Proxy5.getFileInfo(Unknown Source) > at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source) > at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82) > at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59) > at $Proxy5.getFileInfo(Unknown Source) > at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623) > at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461) > at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690) > at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177) > at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196) > at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95) > at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) > at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408) > at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332) > at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:943) > at org.apache.hadoop.ipc.Client.call(Client.java:788) > ... 18 more > 2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s). > 2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s). > 2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 2 time(s). > 2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 3 time(s). > 2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 4 time(s). > 2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 5 time(s). > 2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 6 time(s). > 2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 7 time(s). > 2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 8 time(s). > 2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s). > 2011-03-22 13:22:54,603 WARN > org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the > logs > java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused > at org.apache.hadoop.ipc.Client.wrapException(Client.java:844) > at org.apache.hadoop.ipc.Client.call(Client.java:820) > at org.apache.hadoop.ipc.RPC$Invok -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira