hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Li <fancye...@gmail.com>
Subject Re: is my hbase cluster overloaded?
Date Tue, 22 Apr 2014 06:20:26 GMT
my cluster setup: both 6 machines are virtual machine. each machine:
4CPU Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz 16GB memory
192.168.10.48 namenode/jobtracker
192.168.10.47 secondary namenode
192.168.10.45 datanode/tasktracker
192.168.10.46 datanode/tasktracker
192.168.10.49 datanode/tasktracker
192.168.10.50 datanode/tasktracker

hdfs logs around 20:33
192.168.10.48 namenode log  http://pastebin.com/rwgmPEXR
192.168.10.45 datanode log http://pastebin.com/HBgZ8rtV (I found this
datanode crash first)
192.168.10.46 datanode log http://pastebin.com/aQ2emnUi
192.168.10.49 datanode log http://pastebin.com/aqsWrrL1
192.168.10.50 datanode log http://pastebin.com/V7C6tjpB

hbase logs around 20:33
192.168.10.48 master log http://pastebin.com/2ZfeYA1p
192.168.10.45 region log http://pastebin.com/idCF2a7Y
192.168.10.46 region log http://pastebin.com/WEh4dA0f
192.168.10.49 region log http://pastebin.com/cGtpbTLz
192.168.10.50 region log http://pastebin.com/bD6h5T6p(very strange,
not log at 20:33, but have log at 20:32 and 20:34)

On Tue, Apr 22, 2014 at 12:25 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> Can you post more of the data node log, around 20:33 ?
>
> Cheers
>
>
> On Mon, Apr 21, 2014 at 8:57 PM, Li Li <fancyerii@gmail.com> wrote:
>
>> hadoop 1.0
>> hbase 0.94.11
>>
>> datanode log from 192.168.10.45. why it shut down itself?
>>
>> 2014-04-21 20:33:59,309 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
>> blk_-7969006819959471805_202154 received exception
>> java.io.InterruptedIOException: Interruped while waiting for IO on
>> channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
>> left.
>> 2014-04-21 20:33:59,310 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> DatanodeRegistration(192.168.10.45:50010,
>> storageID=DS-1676697306-192.168.10.45-50010-1392029190949,
>> infoPort=50075, ipcPort=50020):DataXceiver
>> java.io.InterruptedIOException: Interruped while waiting for IO on
>> channel java.nio.channels.SocketChannel[closed]. 0 millis timeout
>> left.
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:157)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>         at
>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>         at java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
>>         at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>>         at java.io.DataInputStream.read(DataInputStream.java:149)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readToBuf(BlockReceiver.java:265)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.readNextPacket(BlockReceiver.java:312)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:376)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:532)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:398)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:107)
>>         at java.lang.Thread.run(Thread.java:722)
>> 2014-04-21 20:33:59,310 ERROR
>> org.apache.hadoop.hdfs.server.datanode.DataNode:
>> DatanodeRegistration(192.168.10.45:50010,
>> storageID=DS-1676697306-192.168.10.45-50010-1392029190949,
>> infoPort=50075, ipcPort=50020):DataXceiver
>> java.io.InterruptedIOException: Interruped while waiting for IO on
>> channel java.nio.channels.SocketChannel[closed]. 466924 millis timeout
>> left.
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:349)
>>         at
>> org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:245)
>>         at
>> org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>>         at
>> org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:350)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:436)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:197)
>>         at
>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:99)
>>         at java.lang.Thread.run(Thread.java:722)
>> 2014-04-21 20:34:00,291 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Waiting for
>> threadgroup to exit, active threads is 0
>> 2014-04-21 20:34:00,404 INFO
>> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService:
>> Shutting down all async disk service threads...
>> 2014-04-21 20:34:00,405 INFO
>> org.apache.hadoop.hdfs.server.datanode.FSDatasetAsyncDiskService: All
>> async disk service threads have been shut down.
>> 2014-04-21 20:34:00,413 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
>> 2014-04-21 20:34:00,424 INFO
>> org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
>> /************************************************************
>> SHUTDOWN_MSG: Shutting down DataNode at app-hbase-1/192.168.10.45
>> ************************************************************/
>>
>> On Tue, Apr 22, 2014 at 11:25 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>> > bq. one datanode failed
>> >
>> > Was the crash due to out of memory error ?
>> > Can you post the tail of data node log on pastebin ?
>> >
>> > Giving us versions of hadoop and hbase would be helpful.
>> >
>> >
>> > On Mon, Apr 21, 2014 at 7:39 PM, Li Li <fancyerii@gmail.com> wrote:
>> >
>> >> I have a small hbase cluster with 1 namenode, 1 secondary namenode, 4
>> >> datanode.
>> >> and the hbase master is on the same machine with namenode, 4 hbase
>> >> slave on datanode machine.
>> >> I found average requests per seconds is about 10,000. and the clusters
>> >> crashed. and I found the reason is one datanode failed.
>> >>
>> >> the datanode configuration is about 4 cpu core and 10GB memory
>> >> is my cluster overloaded?
>> >>
>>

Mime
View raw message