hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "gaojinchao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3722) A lot of data is lost when name node crashed
Date Thu, 07 Apr 2011 08:46:05 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016768#comment-13016768
] 

gaojinchao commented on HBASE-3722:
-----------------------------------

yes, it is important for me. thanks.
some explains about our application:
1.I have a babysitter process,  it controls all Hbase process start or stop.
  when NN crash. Hbase can be self-protection.
  when NN recover. I hope to Hbase can automatically recover service.
  if Hmaster don't shutdown itself, it will skipping splitlog and wait for assign Meta table
or root table.
  when NN recover and region server start up. a lots of data is lost. especially the meta
table. 
  
2. Hbase + hadoop-append should assure all data not to be lost except hadoop is lost data.
   the reliability is importance for my application. I read the code about Hlog and do some
DFX tests.
   the issue is badly. but NN crashed is lowness probability. 
   I find Region server will also retart when NN crash.
   
please review the modification.  I afraid to make a mistake. 

>  A lot of data is lost when name node crashed
> ---------------------------------------------
>
>                 Key: HBASE-3722
>                 URL: https://issues.apache.org/jira/browse/HBASE-3722
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.1
>            Reporter: gaojinchao
>         Attachments: HmasterFilesystem_PatchV1.patch
>
>
> I'm not sure exactly what arose it. there is some split failed logs .
> the master should shutdown itself when the HDFS is crashed.
>  The logs is :
>  2011-03-22 13:21:55,056 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception:
java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getListing(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getListing(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
>          at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
>          at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>          at 
>  org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 13 more
>  2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:05,060 ERROR 
>  org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting 
>  hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception:
java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>          at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>          at java.lang.reflect.Method.invoke(Method.java:597)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>          at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>          at $Proxy5.getFileInfo(Unknown Source)
>          at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
>          at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
>          at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
>          at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
>          at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>          at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
>          at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>          at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>          at java.lang.Thread.run(Thread.java:662)
>  Caused by: java.net.ConnectException: Connection refused
>          at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>          at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>          at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>          at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>          at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>          at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>          at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>          at org.apache.hadoop.ipc.Client.call(Client.java:788)
>          ... 18 more
>  2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 0 time(s).
>  2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 1 time(s).
>  2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 2 time(s).
>  2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 3 time(s).
>  2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 4 time(s).
>  2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 5 time(s).
>  2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 6 time(s).
>  2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 7 time(s).
>  2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 8 time(s).
>  2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 9 time(s).
>  2011-03-22 13:22:54,603 WARN 
>  org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the 
>  logs
>  java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception:
java.net.ConnectException: Connection refused
>          at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>          at org.apache.hadoop.ipc.Client.call(Client.java:820)
>          at org.apache.hadoop.ipc.RPC$Invok

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message