hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinsong Hu" <jinsong...@hotmail.com>
Subject Re: regionserver crash under heavy load
Date Wed, 14 Jul 2010 21:14:51 GMT
sure, I will test with one single column family and see how it goes.

Jimmy.

--------------------------------------------------
From: "Jean-Daniel Cryans" <jdcryans@apache.org>
Sent: Wednesday, July 14, 2010 2:11 PM
To: <user@hbase.apache.org>
Subject: Re: regionserver crash under heavy load

> After discussing your issue on the IRC channel, the biggest problem is
> probably all the compactions that are taking place on all the small
> files that are generated. If all your files are 100k big, before any
> of your families is able to reach 256MB it will take a hell lot of
> compactions, which basically means rewriting your data internally
> hundreds of times. We are wondering what could be done to ease that
> problem, like flushing each family individually, but that wouldn't be
> an ideal situation either and it would require a lot of work.
>
> In the mean time, I would recommend either using a very small number
> of families or, if you really need to have 20 families, loading them
> one by one. You'll be impressed by the speedup.
>
> J-D
>
> On Wed, Jul 14, 2010 at 1:32 PM, Jean-Daniel Cryans <jdcryans@apache.org> 
> wrote:
>> Extremely high IO is bad, hints that there's problem (since it means
>> that most of the processes are waiting for either disk or network).
>> That's what I see in the datanode's log, lots of threads timeout'ing.
>>
>> 20 column families is usually unreasonable (it has a deep impact also
>> with flushing when doing massive imports), you should review your
>> schema. I never good reasons to use more than 2-3 families. Here is an
>> example of what 20 families does:
>>
>> 2010-07-13 04:49:55,981 DEBUG
>> org.apache.hadoop.hbase.regionserver.HRegion: Started memstore flush
>> for region 
>> Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112..
>> Current region memstore size 65.7m
>> 2010-07-13 04:49:57,314 INFO
>> org.apache.hadoop.hbase.regionserver.Store: Added
>> hdfs://t-namenode1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/b33f1f164dd4d3f04aa65c2223c74112/agent/1177093464238110730,
>> entries=8788, sequenceid=35, memsize=2.0m, filesize=144.1k to
>> Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112.
>> ...
>> 2010-07-13 04:49:57,486 INFO
>> org.apache.hadoop.hbase.regionserver.Store: Added
>> hdfs://t-namenode1.cloud.ppops.net:8020/hbase/Spam_MsgEventTable/b33f1f164dd4d3f04aa65c2223c74112/cluster/4834392580329156904,
>> entries=8788, sequenceid=35, memsize=2.0m, filesize=141.6k to
>> Spam_MsgEventTable,,1278996481595.b33f1f164dd4d3f04aa65c2223c74112.
>> ...
>>
>> As you can see, we flush when we hit a global size of 64MB on a region
>> but each family is flushed to a different (very small) file. This is
>> by design, HBase is column-oriented and using all families at the same
>> time usually points to bad usage/schema design. Also this must account
>> for a LOT of the IO you are seeing.
>>
>> And about the compression, I do see an error in the region server log:
>>
>> 2010-07-13 04:49:57,137 WARN org.apache.hadoop.util.NativeCodeLoader:
>> Unable to load native-hadoop library for your platform... using
>> builtin-java classes where applicable
>>
>> Something's missing in your configuration. Please review
>> http://wiki.apache.org/hadoop/UsingLzoCompression
>>
>> J-D
>>
>> On Wed, Jul 14, 2010 at 1:16 PM, Jinsong Hu <jinsong_hu@hotmail.com> 
>> wrote:
>>> the test has stopped. so I can't really tell now. what I noticed is that 
>>> I
>>> was running
>>> "iostat -x 5" all the time and I noticed that there are lots of times 
>>> that
>>> the %idle is
>>> less than 5%,  or even close to 0% . I also notice that the disk %util
>>> higher than 90%.
>>>
>>>
>>> This indicates that the disk is pretty busy. On the other hand, I was
>>> running 3 mappers
>>> to insert records to the hbase, I artificially restrict the insertion 
>>> rate
>>> to be 80 records/second.
>>> The table has 20 column families, each family has 4-5 columns. by my
>>> estimate, each
>>> record is about 10K byte. so the pumping rate is not really that high.
>>>
>>> the table columns are compressed. I do notice that when I disable the
>>> compression,
>>> the insertion rate is even slower. This indicates that the disk usage
>>> probably is indeed the
>>> bottle neck, but it is just hard to believe that I can't even insert 
>>> record
>>> at 3x80=240 records/second
>>> with 10K byte records.  The load isn't heavy at all .
>>> Right now I reduce the speed to even lower 20 records/second
>>> and test and see if it will be in better shape.
>>>
>>> Did you check the datanode log ? There are lots of errors there. can you
>>> tell why it happens ?
>>>
>>>
>>> Jimmy.
>>>
>>> --------------------------------------------------
>>> From: "Jean-Daniel Cryans" <jdcryans@apache.org>
>>> Sent: Wednesday, July 14, 2010 12:10 PM
>>> To: <user@hbase.apache.org>
>>> Subject: Re: regionserver crash under heavy load
>>>
>>>> Thanks for the zookeeper log, I was able to pinpoint more the problem.
>>>> So I see this (didn't dig enough in the logs it seems, wasn't
>>>> expecting a 3 minutes difference)
>>>>
>>>> 2010-07-13 21:49:17,864 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> Got brand-new decompressor
>>>> 2010-07-13 22:03:46,313 DEBUG
>>>> org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache Stats: Sizes:
>>>> Total=3.3544312MB (3517376), Free=405.42056MB (425114272),
>>>> Max=408.775MB (428631648), Counts: Blocks=0, Access=6580365, Hit=0,
>>>> Miss=658036
>>>> 5, Evictions=0, Evicted=0, Ratios: Hit Ratio=0.0%, Miss Ratio=100.0%,
>>>> Evicted/Run=NaN
>>>> 2010-07-13 22:03:46,483 INFO org.apache.hadoop.io.compress.CodecPool:
>>>> Got brand-new compressor
>>>> 2010-07-13 22:03:46,404 INFO org.apache.zookeeper.ClientCnxn: Client
>>>> session timed out, have not heard from server in 908177ms for
>>>> sessionid 0x229c87a7f4e0011, closing socket connection and attempting
>>>> reconnect2010-07-13 22:03:46,400 INFO
>>>> org.apache.hadoop.io.compress.CodecPool: Got brand-new compressor
>>>> 2010-07-13 22:03:46,313 WARN
>>>> org.apache.hadoop.hbase.regionserver.wal.HLog: IPC Server handler 2 on
>>>> 60020 took 868374ms appending an edit to hlog; editcount=4274
>>>>
>>>> At the same time, we can see that the zookeeper log does expire the
>>>> session:
>>>>
>>>> 2010-07-13 21:52:00,001 - INFO  [SessionTracker:ZooKeeperServer@315] -
>>>> Expiring session 0x229c87a7f4e0011, timeout of 120000ms exceeded
>>>> 2010-07-13 21:52:00,001 - INFO  [SessionTracker:ZooKeeperServer@315] -
>>>> Expiring session 0x129c87a7f980015, timeout of 120000ms exceeded
>>>> 2010-07-13 21:52:00,002 - INFO
>>>> [ProcessThread:-1:PrepRequestProcessor@385] - Processed session
>>>> termination for sessionid: 0x229c87a7f4e0011
>>>> 2010-07-13 21:52:00,002 - INFO
>>>> [ProcessThread:-1:PrepRequestProcessor@385] - Processed session
>>>> termination for sessionid: 0x129c87a7f980015
>>>>
>>>> So your region server process was paused for more than 10 minutes, the
>>>> logs show that gap and the lines that follow talk about that big
>>>> pause. Are your nodes swapping?
>>>>
>>>> J-D
>>>>
>>>> On Wed, Jul 14, 2010 at 11:48 AM, Jinsong Hu <jinsong_hu@hotmail.com>
>>>> wrote:
>>>>>
>>>>> the zookeepers are on 3 separate physical machine that colocates with

>>>>> 3
>>>>> master.
>>>>> I have put the logs for them here.
>>>>> http://t-collectors1.proofpoint.com:8079/zookeeper.tar.gz
>>>>> the version is also from CDH3 June 10 release.
>>>>>
>>>>> I checked the datanode in that period , there are lots of exceptions,

>>>>> I
>>>>> have
>>>>> put it here:
>>>>> http://t-collectors1.proofpoint.com:8079/backup.tar.gz
>>>>>
>>>>> Jinsong
>>>>>
>>>>> --------------------------------------------------
>>>>> From: "Jean-Daniel Cryans" <jdcryans@apache.org>
>>>>> Sent: Wednesday, July 14, 2010 11:16 AM
>>>>> To: <user@hbase.apache.org>
>>>>> Subject: Re: regionserver crash under heavy load
>>>>>
>>>>>> So your region servers had their session expired but I don't see
any
>>>>>> sign of GC activity. Usually that points to a case where zookeeper
>>>>>> isn't able to answer request fast enough because it is IO starved.

>>>>>> Are
>>>>>> the zookeeper quorum member on the same nodes as the region servers?
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Wed, Jul 14, 2010 at 10:16 AM, Jinsong Hu <jinsong_hu@hotmail.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> I have uploaded the crashed log files to
>>>>>>>
>>>>>>> http://somewhere
>>>>>>> http://somewhere
>>>>>>>
>>>>>>> it includes the GC log and config files too. both of these 
>>>>>>> regionserver
>>>>>>> crashed when I
>>>>>>> use only 3 mappers to insert records to hbase, and each task
I limit
>>>>>>> the
>>>>>>> data rate
>>>>>>> to 80 records /second.
>>>>>>>
>>>>>>> The machine I use is relatively old, having only 4G of ram. 4
core 
>>>>>>> CPU,
>>>>>>> and
>>>>>>> 250G disk.
>>>>>>> I run tasktracker , datanode and regionserver on them. I have
3
>>>>>>> machines,
>>>>>>> only 1 regionserver
>>>>>>> is still up after the continuous insertion for overnight. before
the
>>>>>>> test
>>>>>>> I
>>>>>>> created  hbase
>>>>>>> tables so I started with an empty table. I saw it running fine
for
>>>>>>> several
>>>>>>> hours and this morning
>>>>>>> I check again, 2 regionserver died. I backup the log and restart
the
>>>>>>> regionserver, and then
>>>>>>> I found the regionserver process is up, but not listening to
any 
>>>>>>> port.
>>>>>>>
>>>>>>> the log doesn't show any error:
>>>>>>> 2010-07-14 16:23:45,769 DEBUG
>>>>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
>>>>>>> : Set watcher on master address ZNode /hbase/master
>>>>>>> 2010-07-14 16:23:45,951 DEBUG
>>>>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
>>>>>>> : Read ZNode /hbase/master got 10.110.24.48:60000
>>>>>>> 2010-07-14 16:23:45,957 INFO
>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>>>>>>> Telling master at 10.110.24.48:60000 that we are up
>>>>>>> 2010-07-14 16:23:46,283 INFO
>>>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook:
>>>>>>> Installed shutdown hook thread: Shutdownhook:regionserver60020
>>>>>>>
>>>>>>>
>>>>>>> I am using the June 10 release of the cloudera distribution for
both
>>>>>>> hadoop
>>>>>>> and hbase.
>>>>>>>
>>>>>>> Jimmy
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --------------------------------------------------
>>>>>>> From: "Jean-Daniel Cryans" <jdcryans@apache.org>
>>>>>>> Sent: Tuesday, July 13, 2010 4:24 PM
>>>>>>> To: <user@hbase.apache.org>
>>>>>>> Subject: Re: regionserver crash under heavy load
>>>>>>>
>>>>>>>> Your region server doesn't look much loaded from the metrics
POV. 
>>>>>>>> But
>>>>>>>> I specifically asked for the lines around that, not just
the dump,
>>>>>>>> since it will contain the reason for the shutdown.
>>>>>>>>
>>>>>>>>> I do notice that the disk usage is pretty high.  I am
just 
>>>>>>>>> thinking
>>>>>>>>> that
>>>>>>>>> our
>>>>>>>>> problem probably is a hardware limit. but the server
should not 
>>>>>>>>> crash
>>>>>>>>> when
>>>>>>>>> the hardware limit is reached.
>>>>>>>>
>>>>>>>> We still don't know why it crashed and it may not even be
related 
>>>>>>>> to
>>>>>>>> HW limits, we need those bigger log traces. Also use pastebin.com

>>>>>>>> or
>>>>>>>> anything like that.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> do you have any idea when CDH3 official release will
be out ?
>>>>>>>>
>>>>>>>> I don't work for cloudera, but IIRC the next beta for CDH3
is due 
>>>>>>>> for
>>>>>>>> September.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Jimmy
>>>>>>>>>
>>>>>>>>> --------------------------------------------------
>>>>>>>>> From: "Jean-Daniel Cryans" <jdcryans@apache.org>
>>>>>>>>> Sent: Tuesday, July 13, 2010 2:55 PM
>>>>>>>>> To: <user@hbase.apache.org>
>>>>>>>>> Subject: Re: regionserver crash under heavy load
>>>>>>>>>
>>>>>>>>>> Please use a pasting service for the log traces.
I personally use
>>>>>>>>>> pastebin.com
>>>>>>>>>>
>>>>>>>>>> You probably had a GC that lasted too long, this
is something out 
>>>>>>>>>> of
>>>>>>>>>> the control of the application (apart from trying
to put as less
>>>>>>>>>> data
>>>>>>>>>> in memory as possible, but you are inserting so...).
Your log
>>>>>>>>>> doesn't
>>>>>>>>>> contain enough information for us to tell, please
look for a 
>>>>>>>>>> "Dump
>>>>>>>>>> of
>>>>>>>>>> metrics" line and paste the lines around that.
>>>>>>>>>>
>>>>>>>>>> J-D
>>>>>>>>>>
>>>>>>>>>> On Tue, Jul 13, 2010 at 2:49 PM, Jinsong Hu 
>>>>>>>>>> <jinsong_hu@hotmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi, Todd:
>>>>>>>>>>>  I downloaded hadoop-0.20.2+320 and hbase-0.89.20100621+17
from
>>>>>>>>>>> CDH3
>>>>>>>>>>> and
>>>>>>>>>>> inserted data with full load, after a while the
hbase 
>>>>>>>>>>> regionserver
>>>>>>>>>>> crashed.
>>>>>>>>>>> I checked  system with "iostat -x 5" and notice
the disk is 
>>>>>>>>>>> pretty
>>>>>>>>>>> busy.
>>>>>>>>>>> Then I modified my client code and reduced the
insertion rate by 
>>>>>>>>>>> 6
>>>>>>>>>>> times,
>>>>>>>>>>> and the test runs fine.  Is there any way that
regionserver be
>>>>>>>>>>> modified
>>>>>>>>>>> so
>>>>>>>>>>> that at least it doesn't crash under heavy load
?  I used apache
>>>>>>>>>>> hbase
>>>>>>>>>>> 0.20.5 distribution and the same problem happens.
I am thinking
>>>>>>>>>>> that
>>>>>>>>>>> when
>>>>>>>>>>> the regionserver is too busy, it should throttle
incoming data 
>>>>>>>>>>> rate
>>>>>>>>>>> to
>>>>>>>>>>> protect the server.  Could this be done ?
>>>>>>>>>>>  Do you also know when the CDH3 official release
will come out ?
>>>>>>>>>>> the
>>>>>>>>>>> one
>>>>>>>>>>> I
>>>>>>>>>>> downloaded is beta version.
>>>>>>>>>>>
>>>>>>>>>>> Jimmy
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2010-07-13 02:24:34,389 INFO
>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>>>>>>>>> Close
>>>>>>>>>>> d Spam_MsgEventTable,56-2010-05-19
>>>>>>>>>>> 10:09:02\x099a420f4f31748828fd24aeea1d06b294,
>>>>>>>>>>> 1278973678315.01dd22f517dabf53ddd135709b68ba6c.
>>>>>>>>>>> 2010-07-13 02:24:34,389 INFO
>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>>>>>>>>>>> aborting server at: m0002029.ppops.net,60020,1278969481450
>>>>>>>>>>> 2010-07-13 02:24:34,389 DEBUG
>>>>>>>>>>> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper
>>>>>>>>>>> : Closed connection with ZooKeeper; /hbase/root-region-server
>>>>>>>>>>> 2010-07-13 02:24:34,389 INFO
>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer:
>>>>>>>>>>> regionserver60020 exiting
>>>>>>>>>>> 2010-07-13 02:24:34,608 INFO
>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook:
>>>>>>>>>>> Shutdown hook starting; hbase.shutdown.hook=true;
>>>>>>>>>>> fsShutdownHook=Thread[Thread-1
>>>>>>>>>>> 0,5,main]
>>>>>>>>>>> 2010-07-13 02:24:34,608 INFO
>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook:
>>>>>>>>>>> Starting fs shutdown hook thread.
>>>>>>>>>>> 2010-07-13 02:24:34,608 ERROR org.apache.hadoop.hdfs.DFSClient:
>>>>>>>>>>> Exception
>>>>>>>>>>> closin
>>>>>>>>>>> g file
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> /hbase/.logs/m0002029.ppops.net,60020,1278969481450/10.110.24.79%3A60020.
>>>>>>>>>>> 1278987220794 : java.io.IOException: IOException
>>>>>>>>>>> flush:java.io.IOException:
>>>>>>>>>>> IOEx
>>>>>>>>>>> ception flush:java.io.IOException: IOException
>>>>>>>>>>> flush:java.io.IOException:
>>>>>>>>>>> IOExce
>>>>>>>>>>> ption flush:java.io.IOException: IOException
>>>>>>>>>>> flush:java.io.IOException:
>>>>>>>>>>> IOExcept
>>>>>>>>>>> ion flush:java.io.IOException: IOException
>>>>>>>>>>> flush:java.io.IOException:
>>>>>>>>>>> IOExceptio
>>>>>>>>>>> n flush:java.io.IOException: IOException 
>>>>>>>>>>> flush:java.io.IOException:
>>>>>>>>>>> Error
>>>>>>>>>>> Recove
>>>>>>>>>>> ry for block blk_-1605696159279298313_2395924
failed  because
>>>>>>>>>>> recovery
>>>>>>>>>>> from
>>>>>>>>>>> prim
>>>>>>>>>>> ary datanode 10.110.24.80:50010 failed 6 times.
 Pipeline was
>>>>>>>>>>> 10.110.24.80:50010
>>>>>>>>>>> . Aborting...
>>>>>>>>>>> java.io.IOException: IOException flush:java.io.IOException:
>>>>>>>>>>> IOException
>>>>>>>>>>> flush:ja
>>>>>>>>>>> va.io.IOException: IOException flush:java.io.IOException:
>>>>>>>>>>> IOException
>>>>>>>>>>> flush:java
>>>>>>>>>>> .io.IOException: IOException flush:java.io.IOException:

>>>>>>>>>>> IOException
>>>>>>>>>>> flush:java.i
>>>>>>>>>>> o.IOException: IOException flush:java.io.IOException:

>>>>>>>>>>> IOException
>>>>>>>>>>> flush:java.io.
>>>>>>>>>>> IOException: IOException flush:java.io.IOException:
Error 
>>>>>>>>>>> Recovery
>>>>>>>>>>> for
>>>>>>>>>>> block
>>>>>>>>>>> blk
>>>>>>>>>>> _-1605696159279298313_2395924 failed  because
recovery from 
>>>>>>>>>>> primary
>>>>>>>>>>> datanode
>>>>>>>>>>> 10.
>>>>>>>>>>> 110.24.80:50010 failed 6 times.  Pipeline was

>>>>>>>>>>> 10.110.24.80:50010.
>>>>>>>>>>> Aborting...
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.sync(DFSClient.java:
>>>>>>>>>>> 3214)
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.fs.FSDataOutputStream.sync(FSDataOutputStream.java:
>>>>>>>>>>> 97)
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.io.SequenceFile$Writer.syncFs(SequenceFile.java:944
>>>>>>>>>>> )
>>>>>>>>>>>   at sun.reflect.GeneratedMethodAccessor24.invoke(Unknown

>>>>>>>>>>> Source)
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>>>>>>>>>>> sorImpl.java:25)
>>>>>>>>>>>   at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.sync(S
>>>>>>>>>>> equenceFileLogWriter.java:124)
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.hflush(HLog.java:826)
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.sync(HLog.java:1004)
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.append(HLog.java:817)
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchPut(HRegion.j
>>>>>>>>>>> ava:1531)
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegion.put(HRegion.java:1447)
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.put(HRegionServer.
>>>>>>>>>>> java:1703)
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.HRegionServer.multiPut(HRegionSe
>>>>>>>>>>> rver.java:2361)
>>>>>>>>>>>   at sun.reflect.GeneratedMethodAccessor10.invoke(Unknown

>>>>>>>>>>> Source)
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
>>>>>>>>>>> sorImpl.java:25)
>>>>>>>>>>>   at java.lang.reflect.Method.invoke(Method.java:597)
>>>>>>>>>>>   at
>>>>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:576)
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:
>>>>>>>>>>> 919)
>>>>>>>>>>> 2010-07-13 02:24:34,610 ERROR org.apache.hadoop.hdfs.DFSClient:
>>>>>>>>>>> Exception
>>>>>>>>>>> closin
>>>>>>>>>>> g file
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> /hbase/Spam_MsgEventTable/079c7de876422e57e5f09fef5d997e06/.tmp/677365813
>>>>>>>>>>> 4549268273 : java.io.IOException: All datanodes

>>>>>>>>>>> 10.110.24.80:50010
>>>>>>>>>>> are
>>>>>>>>>>> bad.
>>>>>>>>>>> Abor
>>>>>>>>>>> ting...
>>>>>>>>>>> java.io.IOException: All datanodes 10.110.24.80:50010
are bad.
>>>>>>>>>>> Aborting...
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.processDatanodeError
>>>>>>>>>>> (DFSClient.java:2603)
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1600(DFSClien
>>>>>>>>>>> t.java:2139)
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
>>>>>>>>>>> Client.java:2306)
>>>>>>>>>>> 2010-07-13 02:24:34,729 INFO
>>>>>>>>>>> org.apache.hadoop.hbase.regionserver.ShutdownHook:
>>>>>>>>>>> Shutdown hook finished.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
> 

Mime
View raw message