hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jean-Daniel Cryans" <jdcry...@apache.org>
Subject Re: Could not obtain block
Date Wed, 10 Sep 2008 13:02:49 GMT
Krzysztof,

HBase is not known to be able to take a lot of inserts when starting of a
fresh table since there is no load distribution. To be able to help you, we
would need to know:

- The hardware of each machine (very important)
- The datanodes logs when it failed
- If your load made the machines swap

You should probably also use the current branch 0.2, it is more stable (you
will also have to upgrade to hadoop 0.17.2.1)

Thanks,

J-D

2008/9/10 Krzysztof Szlapinski <krzysztof.szlapinski@starline.hk>

> Hi,
> We've run some test on a three node hbase cluster:
> 1 node running Master and Region Server
> 2 nodes runnig Region Servers
> Hbase 0.2
> Hadoop 0.17.1
>
> The test was to put some data in an endless loop
> the the data row key was:
> aaa + timestamp
> aab + timestamp
> aac + timestamp
> ....
> aaz + timestamp
> aba + timestamp
> ....
> zzz + timestamp
> and so on...
>
> I could insert up to 7000  rows per second
> and it all went well up to a time when hbase masters started to throwing:
>
>   08/09/10 07:09:28 INFO ipc.Client: Retrying connect to server:
>   /192.168.1.202:60020. Already tried 1 time(s).
>   08/09/10 07:09:29 INFO ipc.Client: Retrying connect to server:
>   /192.168.1.202:60020. Already tried 2 time(s).
>   08/09/10 07:09:30 INFO ipc.Client: Retrying connect to server:
>   /192.168.1.202:60020. Already tried 3 time(s).
>   08/09/10 07:09:31 INFO ipc.Client: Retrying connect to server:
>   /192.168.1.202:60020. Already tried 4 time(s).
>   ....
>
>
> then the whole process became very slow
> we could notice that beteween these "Retrying connect" messages Hbase
> managed to insert some data:
>
>   08/09/10 07:13:37 INFO ipc.Client: Retrying connect to server:
>   /192.168.1.201:60020. Already tried 10 time(s).
>   08/09/10 07:07:43 data with row id mvy_1220636795411 INSERTED
>   08/09/10 07:42:59 INFO ipc.Client: Retrying connect to server:
>   /192.168.1.202:60020. Already tried 1 time(s).
>
>
> After examining hbase logs at a debug level we've found that at some Hbase
> started to make some parallel Splits at the same time:
>
>   2008-09-10 07:03:42,251 INFO
>   org.apache.hadoop.hbase.master.ServerManager: Received
>   MSG_REPORT_SPLIT: test_log,nre_1220650334054,1220684196714:
>   test_log,nre_1220650334054,1220684196714 split; daughters:
>   test_log,nre_1220650334054,1221023007756,
>   test_log,nwm_1220659679240,1221023007756 from 192.168.1.201:60020
>   2008-09-10 07:03:42,251 INFO
>   org.apache.hadoop.hbase.master.RegionManager: assigning region
>   test_log,nwm_1220659679240,1221023007756 to server 192.168.1.201:60020
>   2008-09-10 07:03:42,388 INFO
>   org.apache.hadoop.hbase.master.ServerManager: Received
>   MSG_REPORT_SPLIT: test_log,vlz_1220657647003,1220683727773:
>   test_log,vlz_1220657647003,1220683727773 split; daughters:
>   test_log,vlz_1220657647003,1221023049055,
>   test_log,vri_1220892338621,1221023049055 from 192.168.1.202:60020
>   2008-09-10 07:03:42,399 INFO
>   org.apache.hadoop.hbase.master.RegionManager: assigning region
>   test_log,nre_1220650334054,1221023007756 to server 192.168.1.201:60020
>
>
> And then after one of the splits it threw: IOException
>
>    2008-09-10 07:06:18,591 INFO
>   org.apache.hadoop.hbase.master.ServerManager: Received
>   MSG_REPORT_SPLIT: test_log,mvy_1220633069016,1220684186307:
>   test_log,mvy_1220633069016,1220684186307 split; daughters:
>   test_log,mvy_1220633069016,1221023269764,
>   test_log,nbg_1220636795411,1221023269764 from 192.168.1.202:60020
>   2008-09-10 07:06:18,592 INFO
>   org.apache.hadoop.hbase.master.RegionManager: assigning region
>   test_log,nbg_1220636795411,1221023269764 to server 192.168.1.202:60020
>   2008-09-10 07:06:18,593 INFO
>   org.apache.hadoop.hbase.master.RegionManager: assigning region
>   test_log,mvy_1220633069016,1221023269764 to server 192.168.1.202:60020
>   2008-09-10 07:06:21,600 INFO
>   org.apache.hadoop.hbase.master.ServerManager: Received
>   MSG_REPORT_PROCESS_OPEN: test_log,mvy_1220633069016,1221023269764
>   from 192.168.1.202:60020
>   2008-09-10 07:06:21,601 INFO
>   org.apache.hadoop.hbase.master.ServerManager: Received
>   MSG_REPORT_OPEN: test_log,nbg_1220636795411,1221023269764 from
>   192.168.1.202:60020
>   2008-09-10 07:06:21,601 INFO
>   org.apache.hadoop.hbase.master.ProcessRegionOpen$1:
>   test_log,nbg_1220636795411,1221023269764 open on 192.168.1.202:60020
>   2008-09-10 07:06:21,601 INFO
>   org.apache.hadoop.hbase.master.ProcessRegionOpen$1: updating row
>   test_log,nbg_1220636795411,1221023269764 in region .META.,,1 with
>   startcode 1220950604250 and server 192.168.1.202:60020
>   2008-09-10 07:06:30,609 INFO
>   org.apache.hadoop.hbase.master.ServerManager: Received
>   MSG_REPORT_CLOSE: test_log,mvy_1220633069016,1221023269764:
>   java.io.IOException: Could not obtain block: blk_4797637782612180770
>
> file=/hbase/test_log/2024359288/message/info/1192041150217141868.1504866925
>       at
>
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1430)
>       at
>
> org.apache.hadoop.dfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1281)
>       at
>   org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1385)
>       at
>   org.apache.hadoop.dfs.DFSClient$DFSInputStream.read(DFSClient.java:1337)
>       at
>   java.io.DataInputStream.readUnsignedShort(DataInputStream.java:320)
>       at java.io.DataInputStream.readUTF(DataInputStream.java:572)
>       at java.io.DataInputStream.readUTF(DataInputStream.java:547)
>       at
>
> org.apache.hadoop.hbase.regionserver.HStoreFile$Reference.readFields(HStoreFile.java:573)
>       at
>
> org.apache.hadoop.hbase.regionserver.HStoreFile.readSplitInfo(HStoreFile.java:292)
>       at
>
> org.apache.hadoop.hbase.regionserver.HStore.loadHStoreFiles(HStore.java:388)
>       at
>   org.apache.hadoop.hbase.regionserver.HStore.<init>(HStore.java:218)
>       at
>
> org.apache.hadoop.hbase.regionserver.HRegion.instantiateHStore(HRegion.java:1653)
>       at
>   org.apache.hadoop.hbase.regionserver.HRegion.<init>(HRegion.java:470)
>       at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateRegion(HRegionServer.java:902)
>       at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer.openRegion(HRegionServer.java:877)
>       at
>
> org.apache.hadoop.hbase.regionserver.HRegionServer$Worker.run(HRegionServer.java:817)
>       at java.lang.Thread.run(Thread.java:619)
>    from 192.168.1.202:60020
>
> And then it threw "Connection refused" while scanning ROOT region
>
>   2008-09-10 07:06:55,706 INFO org.apache.hadoop.ipc.Client: Retrying
>   connect to server: /192.168.1.202:60020. Already tried 10 time(s).
>   2008-09-10 07:06:56,707 INFO org.apache.hadoop.ipc.Client: Retrying
>   connect to server: yiyi03/192.168.1.203:60020. Already tried 1 time(s).
>   2008-09-10 07:06:56,739 WARN
>   org.apache.hadoop.hbase.master.BaseScanner: Scan ROOT region
>   java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at
>   sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
>   at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:100)
>   at
>   org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:174)
>   at org.apache.hadoop.ipc.Client.getConnection(Client.java:623)
>   at org.apache.hadoop.ipc.Client.call(Client.java:546)
>   at
>   org.apache.hadoop.hbase.ipc.HbaseRPC$Invoker.invoke(HbaseRPC.java:230)
>   at $Proxy2.openScanner(Unknown Source)
>   at
>
> org.apache.hadoop.hbase.master.BaseScanner.scanRegion(BaseScanner.java:159)
>   at
>   org.apache.hadoop.hbase.master.RootScanner.scanRoot(RootScanner.java:48)
>   at
>
> org.apache.hadoop.hbase.master.RootScanner.maintenanceScan(RootScanner.java:74)
>   at
>   org.apache.hadoop.hbase.master.BaseScanner.chore(BaseScanner.java:139)
>   at org.apache.hadoop.hbase.Chore.run(Chore.java:63)
>
> What could be a cause? Is it a well known issue? Some misconfiguration?
>
>
> Krzysiek
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message