hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: A lot of data is lost when name node crashed
Date Thu, 31 Mar 2011 17:19:34 GMT
(sending back to the list, please don't answer to directly to the
sender, always send back to the mailing list)

MasterFileSystem has most of DFS interactions, it seems that
checkFileSystem is never called (it should be) and splitLog catches
the ERROR when splitting but doesn't abort.

Would you mind opening a jira about this issue and perhaps submit a patch?

Thx,

J-D

On Thu, Mar 31, 2011 at 5:40 AM, Gaojinchao <gaojinchao@huawei.com> wrote:
> Thanks, I will try to do it again because last one info log level don't turn on.
> I have a question.
> Which code is the Master kill itself when it find namenode crashed?
>
> if (isCarryingRoot()) { // -ROOT-
>      try {
>        this.services.getAssignmentManager().assignRoot();
>      } catch (KeeperException e) {
>        this.server.abort("In server shutdown processing, assigning root", e);
>        throw new IOException("Aborting", e);
>      }
>    }
>
> -----邮件原件-----
> 发件人: jdcryans@gmail.com [mailto:jdcryans@gmail.com] 代表 Jean-Daniel Cryans
> 发送时间: 2011年3月30日 1:39
> 收件人: user@hbase.apache.org
> 抄送: Gaojinchao; Chenjian
> 主题: Re: A lot of data is lost when name node crashed
>
> I was expecting it would die, strange it didn't. Could you provide a
> bigger log, this one basically tells us the NN is gone but that's
> about it. Please put it on a web server or something else that's
> easily reachable for anyone (eg don't post the full thing here).
>
> Thx,
>
>
> J-D
>
> On Tue, Mar 29, 2011 at 4:28 AM, Gaojinchao <gaojinchao@huawei.com> wrote:
>> I do some performance test for hbase version 0.90.1
>> when the name node crashed, I find some data lost.
>> I'm not sure exactly what arose it.  It seems like split logs failed.
>> I think the master should shutdown itself when HDFS crashed.
>>
>>
>> The logs is :
>> 2011-03-22 13:21:55,056 WARN org.apache.hadoop.hbase.master.LogCleaner: Error while
cleaning the logs
>> java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception:
java.net.ConnectException: Connection refused
>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:820)
>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>>         at $Proxy5.getListing(Unknown Source)
>>         at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>         at $Proxy5.getListing(Unknown Source)
>>         at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)
>>         at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)
>>         at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)
>>         at org.apache.hadoop.hbase.Chore.run(Chore.java:66)
>>         at org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)
>> Caused by: java.net.ConnectException: Connection refused
>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>         at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:788)
>>         ... 13 more
>> 2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 0 time(s).
>> 2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 1 time(s).
>> 2011-03-22 13:21:58,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 2 time(s).
>> 2011-03-22 13:21:59,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 3 time(s).
>> 2011-03-22 13:22:00,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 4 time(s).
>> 2011-03-22 13:22:01,058 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 5 time(s).
>> 2011-03-22 13:22:02,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 6 time(s).
>> 2011-03-22 13:22:03,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 7 time(s).
>> 2011-03-22 13:22:04,059 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 8 time(s).
>> 2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 9 time(s).
>> 2011-03-22 13:22:05,060 ERROR org.apache.hadoop.hbase.master.MasterFileSystem: Failed
splitting hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398
>> java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception:
java.net.ConnectException: Connection refused
>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:820)
>>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)
>>         at $Proxy5.getFileInfo(Unknown Source)
>>         at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
>>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>>         at java.lang.reflect.Method.invoke(Method.java:597)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>>         at $Proxy5.getFileInfo(Unknown Source)
>>         at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)
>>         at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)
>>         at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)
>>         at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)
>>         at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)
>>         at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)
>>         at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>>         at java.lang.Thread.run(Thread.java:662)
>> Caused by: java.net.ConnectException: Connection refused
>>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>>         at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>>         at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)
>>         at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)
>>         at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)
>>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:788)
>>         ... 18 more
>> 2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 0 time(s).
>> 2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 1 time(s).
>> 2011-03-22 13:22:47,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 2 time(s).
>> 2011-03-22 13:22:48,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 3 time(s).
>> 2011-03-22 13:22:49,601 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 4 time(s).
>> 2011-03-22 13:22:50,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 5 time(s).
>> 2011-03-22 13:22:51,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 6 time(s).
>> 2011-03-22 13:22:52,602 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 7 time(s).
>> 2011-03-22 13:22:53,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 8 time(s).
>> 2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server:
C4C1/157.5.100.1:9000. Already tried 9 time(s).
>> 2011-03-22 13:22:54,603 WARN org.apache.hadoop.hbase.master.LogCleaner: Error while
cleaning the logs
>> java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception:
java.net.ConnectException: Connection refused
>>         at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)
>>         at org.apache.hadoop.ipc.Client.call(Client.java:820)
>>         at org.apache.hadoop.ipc.RPC$Invok
>>
>

Mime
View raw message