hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: HBase crashes when one server goes down
Date Mon, 14 Feb 2011 19:44:17 GMT
Mmm first you should use an hadoop version that supports append, and
make sure it's enabled, else you'll have data loss.

Also, in the future, please use pastebin.com (or similar) to post logs.

So by the looks it seems the master isn't able to split the logs from
the machine that died, but the log is cut so I can't really tell
what's going on after that. Would it be possible to see a log from the
moment the znode expired (look for "expired" in the log) and then get
everything from that point?

Thx,

J-D

On Mon, Feb 14, 2011 at 11:38 AM, Rodrigo Barreto <rodbarreto@gmail.com> wrote:
> We google it but we didn't find anything. The exceptions are listed bellow.
>
>
>
>
>  tail -f /http/hbase-0.89/logs/hbase-hadoop-master-lab1.log
>
>
> 2011-02-14 16:29:48,267 WARN
> org.apache.hadoop.hbase.master.BaseScanner: Scan one META region:
> {server: 192.168.0.8:60020, regionname: .META.,,1.1028785192,
> startKey: <>}
>
> java.net.SocketTimeoutException: 20000 millis timeout while waiting
> for channel to be ready for connect. ch :
> java.nio.channels.SocketChannel[connection-pending
> remote=/192.168.0.8:60020]
>
>         at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>
>         at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:309)
>
>         at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:857)
>
>         at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:725)
>
>         at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:252)
>
>         at $Proxy1.openScanner(Unknown Source)
>
>         at org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:182)
>
>         at org.apache.hadoop.hbase.master.MetaScanner.scanOneMetaRegion(MetaScanner.java:73)
>
>         at org.apache.hadoop.hbase.master.MetaScanner.maintenanceScan(MetaScanner.java:129)
>
>         at org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:156)
>
>         at org.apache.hadoop.hbase.Chore.run(Chore.java:68)
>
> 2011-02-14 16:29:48,268 INFO
> org.apache.hadoop.hbase.master.BaseScanner: All 1 .META. region(s)
> scanned
>
>
>
> 2011-02-14 16:30:22,634 WARN
> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Failed
> processing: ProcessServerShutdown of slave2,60020,1297691605226;
> putting onto delayed todo queue
>
> java.io.IOException: Failed to open
> hdfs://master:54310/hbase/.logs/slave2,60020,1297691605226/192.168.0.8%3A60020.1297691605518
> for append
>
>         at org.apache.hadoop.hbase.util.FSUtils.recoverFileLease(FSUtils.java:640)
>
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1322)
>
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1210)
>
>         at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:299)
>
>         at org.apache.hadoop.hbase.master.RegionServerOperationQueue.process(RegionServerOperationQueue.java:147)
>
>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:521)
>
> Caused by: java.io.IOException: java.io.IOException: Append to hdfs
> not supported. Please refer to dfs.support.append configuration
> parameter.
>
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1153)
>
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:392)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:616)
>
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:416)
>
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
>
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
>
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
>
>         at org.apache.hadoop.hbase.util.FSUtils.recoverFileLease(FSUtils.java:623)
>
>         ... 5 more
>
> 2011-02-14 16:30:24,557 INFO
> org.apache.hadoop.hbase.master.ServerManager: 2 region servers, 1
> dead, average load 4.0[slave2,60020,1297691605226]
>
> 2011-02-14 16:30:24,663 INFO
> org.apache.hadoop.hbase.master.BaseScanner: RegionManager.rootScanner
> scanning m
>
>
> 2011-02-14 16:31:42,665 DEBUG
> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Processing
> todo: ProcessServerShutdown of slave2,60020,1297691605226
>                                     2011-02-14 16:31:42,665 INFO
> org.apache.hadoop.hbase.master.RegionServerOperation: Process shutdown
> of server slave2,60020,1297691605226: logSplit: false, rootRescanned:
> false, numberOfMetaRegions: 1, online
>
> MetaRegions.size(): 0
>
> 2011-02-14 16:31:42,667 INFO
> org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting 2 hlog(s) in
> hdfs://master:54310/hbase/.logs/slave2,60020,1297691605226
>
> 2011-02-14 16:31:42,667 DEBUG
> org.apache.hadoop.hbase.regionserver.wal.HLog: Splitting hlog 1 of 2:
> hdfs://master:54310/hbase/.logs/slave2,60020,1297691605226/192.168.0.8%3A60020.1297691605518,
> length=8085
>
> 2011-02-14 16:31:42,667 INFO org.apache.hadoop.hbase.util.FSUtils:
> Recovering filehdfs://master:54310/hbase/.logs/slave2,60020,1297691605226/192.168.0.8%3A60020.1297691605518
>
> 2011-02-14 16:31:42,669 WARN
> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Failed
> processing: ProcessServerShutdown of slave2,60020,1297691605226;
> putting onto delayed todo queue
>
> java.io.IOException: Failed to open
> hdfs://master:54310/hbase/.logs/slave2,60020,1297691605226/192.168.0.8%3A60020.1297691605518
> for append
>
>         at org.apache.hadoop.hbase.util.FSUtils.recoverFileLease(FSUtils.java:640)
>
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1322)
>
>         at org.apache.hadoop.hbase.regionserver.wal.HLog.splitLog(HLog.java:1210)
>
>         at org.apache.hadoop.hbase.master.ProcessServerShutdown.process(ProcessServerShutdown.java:299)
>
>         at org.apache.hadoop.hbase.master.RegionServerOperationQueue.process(RegionServerOperationQueue.java:147)
>
>         at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:521)
>
> Caused by: java.io.IOException: java.io.IOException: Append to hdfs
> not supported. Please refer to dfs.support.append configuration
> parameter.
>
>         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:1153)
>
>         at org.apache.hadoop.hdfs.server.namenode.NameNode.append(NameNode.java:392)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>
>         at java.lang.reflect.Method.invoke(Method.java:616)
>
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:508)
>
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:959)
>
>         at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:955)
>
>         at java.security.AccessController.doPrivileged(Native Method)
>
>         at javax.security.auth.Subject.doAs(Subject.java:416)
>
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:953)
>
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>
>         at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:532)
>
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.decodeRemoteException(RemoteExceptionHandler.java:94)
>
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkThrowable(RemoteExceptionHandler.java:48)
>
>         at org.apache.hadoop.hbase.RemoteExceptionHandler.checkIOException(RemoteExceptionHandler.java:66)
>
>         at org.apache.hadoop.hbase.util.FSUtils.recoverFileLease(FSUtils.java:623)
>
>         ... 5 more
>
>
>
> ====================================================================================
>
>
> [hadoop@lab1 root]$ /http/hbase-0.89/bin/hbase shell
>
> HBase Shell; enter 'help<RETURN>' for list of supported commands.
>
> Type "exit<RETURN>" to leave the HBase Shell
>
> Version: 0.89.20100924, r1001068, Tue Oct  5 12:12:44 PDT 2010
>
> hbase(main):001:0> list
>
> TABLE
>
> ERROR: org.apache.hadoop.hbase.client.RetriesExhaustedException:
> Trying to contact region server slave2:60020 for region .META.,,1, row
> '', but failed after 7 attempts.
>
> Exceptions:
>
> java.net.NoRouteToHostException: No route to host
>
> java.net.NoRouteToHostException: No route to host
>
> java.net.NoRouteToHostException: No route to host
>
> java.net.NoRouteToHostException: No route to host
>
> java.net.NoRouteToHostException: No route to host
>
> java.net.NoRouteToHostException: No route to host
>
> java.net.NoRouteToHostException: No route to host
>
> Here is some help for this command:
>
>           List all tables in hbase. Optional regular expression parameter could
>
>           be used to filter the output. Examples:
>
>             hbase> list
>
>             hbase> list 'abc.*'
>
>
>
>
> hbase(main):002:0>
>
>
>
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>
>
>
> 2011/2/14 Jean-Daniel Cryans <jdcryans@apache.org>
>>
>> Same answer that I gave to your other email:
>>
>> We'll need more information to help you out.
>> Have you checked the logs? If you see exceptions in there, did you
>> google them trying to figure out what's going on?
>>
>> Finally, does your setup meet all the requirements?
>> http://hbase.apache.org/notsoquick.html#requirements
>>
>> J-D
>>
>> On Mon, Feb 14, 2011 at 9:58 AM, Rodrigo Barreto <rodbarreto@gmail.com> wrote:
>> > Hi,
>> >
>> > We are new with Hadoop, we have just configured a cluster with 3 servers and
>> > everything is working ok except when one server goes down, the Hadoop / HDFS
>> > continues working but the HBase stops, the queries does not return results
>> > until we restart the HBase. The HBase configuration is copied bellow, please
>> > help us.
>> >
>> > ########## HBASE-SITE.XML ###############
>> >
>> > <configuration>
>> >        <property>
>> >                <name>hbase.zookeeper.quorum</name>
>> >                <value>master,slave1,slave2</value>
>> >                <description>The directory shared by region servers.
>> >                </description>
>> >        </property>
>> >        <property>
>> >                <name>hbase.rootdir</name>
>> >                <value>hdfs://master:54310/hbase</value>
>> >        </property>
>> >        <property>
>> >                <name>hbase.cluster.distributed</name>
>> >                <value>true</value>
>> >        </property>
>> >        <property>
>> >                <name>hbase.master</name>
>> >                <value>master:60000</value>
>> >                <description>The host and port that the HBase master
runs
>> > at.
>> >                </description>
>> >        </property>
>> >
>> >        <property>
>> >                <name>dfs.replication</name>
>> >                <value>2</value>
>> >                <description>Default block replication.
>> >                The actual number of replications can be specified when
the
>> > file is created.
>> >                The default is used if replication is not specified in
>> > create time.
>> >                </description>
>> >        </property>
>> > </configuration>
>> >
>> >
>> > Thanks,
>> >
>> > Rodrigo Barreto.
>> >
>

Mime
View raw message