hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kannan Muthukkaruppan <Kan...@facebook.com>
Subject RE: scanner lease expired/region server shutdown
Date Wed, 27 Jan 2010 22:30:37 GMT
1. <<< Well, on split, parent is in .META. but offlined.  Its followed by two
entries, one for each daughter region.  I was asking if after the
offlined parent, was there not an entry for a region with same start
key as parent (the daughter that is to take over from the parent?).
If not, then it would seem that update of the parent -- making it
offlined and adding the splitA and splitB columns went through but the
addition of the daughter rows did not... which is odd.>>>>

That's what seems to have happened. For all these problem regions (about 50 out of the 400
odd regions my table had), a scan of .META. shows 5 entries: info:regioninfo (with OFFLINE=>true),
info:server, info:serverstartcode, info:splitA, info:splitB.

 TestTable,0076347792,126444 column=info:regioninfo, timestamp=1264470099825, value=REGION
=> {NAME => 'TestTa
 7634546                     ble,0076347792,1264447634546', STARTKEY => '0076347792', ENDKEY
=> '', ENCODED =>
                              654466296, OFFLINE => true, SPLIT => true, TABLE =>
{{NAME => 'TestTable', FAMIL
                             ...
 TestTable,0076347792,126444 column=info:server, timestamp=1264447641018, value=10.129.68.214:60020
 7634546
 TestTable,0076347792,126444 column=info:serverstartcode, timestamp=1264447641018, value=1264109117245
 7634546
 TestTable,0076347792,126444 column=info:splitA, timestamp=1264470099825, value=\000\n0076724048\000\000\000\0
 7634546                     01&g\003\357\277\275\e\"TestTable,0076347792,1264448682267\000\n0076347792\000\00
 TestTable,0076347792,126444 column=info:splitB, timestamp=1264470099825, value=\000\000\000\000\000\001&g\003
 7634546                     \357\277\275\e\"TestTable,0076724048,1264448682267\000\n0076724048\000\000\000\00

There is no other entry with a start key of 0076347792.

2. <<<(Thinking on it, it would be easy enough to add to the master fixup
for this condition until we do the proper fix in 0.21).>>>

Do share your thoughts on that.

3. <<< Is there a public folder some place you can post them so I can pull them.>>>

Stack: perhaps I'll email it to you offline as a start.

regards,
Kannan
-----Original Message-----
From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of Stack
Sent: Tuesday, January 26, 2010 9:20 AM
To: hbase-dev@hadoop.apache.org
Subject: Re: scanner lease expired/region server shutdown

On Tue, Jan 26, 2010 at 7:01 AM, Kannan Muthukkaruppan
<Kannan@facebook.com> wrote:
> 1 Name Node/4 Data Nodes. Of the 4 DN, one is running the HBase Master, and the other
three are running region servers. ZK is on all the same 5 nodes. Should ideally have separated
this out. The nodes are 16GB, 4 disk machines.
>

CPUs?  4 disks is good.


> 2. I examined the HDFS datanode log on the same machine around that time the problems
happened, and saw this:
>
> 2010-01-25 11:33:09,531 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.129.68.212:50010,
storageID=DS-1418567969-10.129.68.212-50010-1263610251776, infoPort=50075, ipcPort=\
> 50020):Got exception while serving blk_5691809099673541164_10475 to /10.129.68.212:
> java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be
ready for write. ch : java.nio.channels.SocketChannel[connected local=/10.129.68.212:50010
remote=/10.129.68.212:477\
> 29]
>        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
>        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
>        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
>        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:400)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
>        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
>        at java.lang.Thread.run(Thread.java:619)
>

Does this exception show in regionserver logs -- the client?  See
HADOOP-3831 for 'fix' that has DFSClient reestablishing connection.


> [For future runs, will try to turn on Java GC logging as well as per your suggestion.]

Yes. Getting GC settings right is critical as it is for any java application.



> <<< Then its a dropped edit of .META.  The parent region -- the one being split
got updates which added the splitA and splitB and its offlining but the new region didn't
get inserted?  The crash happened just at that time?>>>
>
> Didn't fully understand the above.
>

Well, on split, parent is in .META. but offlined.  Its followed by two
entries, one for each daughter region.  I was asking if after the
offlined parent, was there not an entry for a region with same start
key as parent (the daughter that is to take over from the parent?).
If not, then it would seem that update of the parent -- making it
offlined and adding the splitA and splitB columns went through but the
addition of the daughter rows did not... which is odd.

(Thinking on it, it would be easy enough to add to the master fixup
for this condition until we do the proper fix in 0.21).


> <<< It would be interesting to take a look at the regionserver logs from that
time.  Please post if you have a moment so we can take a looksee.>>>
>
> Will do. Should I just send it as an attachment to the list? Or is their a recommended
way of doing this?
>
Don't attach.  They just get dropped.  Is there a public folder some
place you can post them so I can pull them.

Thanks Kannan.
St.Ack


> regards,
> Kannan
> ________________________________________
> From: saint.ack@gmail.com [saint.ack@gmail.com] On Behalf Of Stack [stack@duboce.net]
> Sent: Monday, January 25, 2010 8:59 PM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: scanner lease expired/region server shutdown
>
> What J-D said but with ornamentation.  See below.
>
> On Mon, Jan 25, 2010 at 7:14 PM, Kannan Muthukkaruppan
> <Kannan@facebook.com> wrote:
>> I was doing some brute force testing - running one instance of PerformanceEvaluation
(PE) for writes, and another instance for randomReads.
>>
>
> Whats the cluster your hitting like?  That 5-node thingy?  Whats the
> hardware profile.
>
>
>> One of the region servers went down after a while. [This is on 0.20.2]. The region
server log had things like:
>>
>> 2010-01-25 11:33:39,416 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server
handler 34 on 60020 took 65190ms appending an edit to hlog; editcount=27878
>
> This is saying that it took 65 seconds to append to hdfs.  What was
> going on at that time?  A fat GC in the regionserver or over in a
> Datanode?  You can enable GC logging uncommenting stuff in the
> hbase-env.sh.   Feed the GC log to https://gchisto.dev.java.net/
> (suggested by the zookeeper lads).  Its good for finding the long
> pauses.  We should find the logs around the long GC pause.  Its
> probably a failed promotion that brought on the stop-the-world GC.  Or
> your HDFS was struggling?
>
> I was going to say that the PE puts the worse kind of loading on the
> hbase cache -- nothing sticks around -- but looking at your numbers
> below, cache seems to be working pretty well.
>
>> 2010-01-25 11:33:39,416 DEBUG org.apache.hadoop.hbase.io.hfile.LruBlockCache: Cache
Stats: Sizes: Total=1815.4441MB (1903631104), Free=581.23096MB (609464832),\
>>  Max=2396.675MB (2513095936), Counts: Blocks=28216, Access=10904689, Hit=9313612,
Miss=1591077, Evictions=110, Evicted=414336, Ratios: Hit Ratio=85.40924191474\
>> 915%, Miss Ratio=14.590759575366974%, Evicted/Run=3766.69091796875
>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server
handler 52 on 60020 took 65195ms appending an edit to hlog; editcount=27879
>> 2010-01-25 11:33:39,417 WARN org.apache.hadoop.hbase.regionserver.HLog: IPC Server
handler 75 on 60020 took 65195ms appending an edit to hlog; editcount=27880
>> 2010-01-25 11:33:39,417 INFO org.apache.hadoop.hbase.regionserver.HRegionServer:
Scanner -5220035164735014996 lease expired
>>
>> This was followed by:
>> 2010-01-25 11:33:39,636 WARN org.apache.zookeeper.ClientCnxn: Exception closing session
0x12652c620520001 to sun.nio.ch.SelectionKeyImpl@788ab708<mailto:sun.nio.ch.SelectionKeyImpl@788ab708>
>> java.io.IOException: TIMED OUT
>
> This is the zk timeout.
>
>>
>> And also errors like:
>> 2010-01-25 11:33:39,692 ERROR org.apache.hadoop.hbase.regionserver.HRegionServer:
>> org.apache.hadoop.hbase.UnknownScannerException: Name: -5220035164735014996
>>
>
> These will happen after the above.  The regionserver is on its way
> down.  Probably emptied the list of outstanding regions.
>
>
>> Subsequently, I restarted the region server and just started the PE in sequentialWrite
mode. It seems to be off to a really slow start. For several regions, it is printing the following:
>>
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta
attempt 0 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1
for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>> 10/01/25 18:12:44 DEBUG client.HConnectionManager$TableServers: Cached location for
.META.,,1 is 10.129.68.212:60020
>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta
attempt 1 of 10 failed; retrying after sleep of 1000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:12:45 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1
for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>> [..and so on..]
>> 10/01/25 18:12:59 DEBUG client.HConnectionManager$TableServers: Cached location for
.META.,,1 is 10.129.68.212:60020
>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: locateRegionInMeta
attempt 8 of 10 failed; retrying after sleep of 16000 because: region offline: TestTable,0031841132,1264121653303
>> 10/01/25 18:13:07 DEBUG client.HConnectionManager$TableServers: Removed .META.,,1
for tableName=.META. from cache because of TestTable,0032000000,99999999999999
>>
>> When I scan the '.META.', I noticed that most of the regions for which I get the
above error were in the middle of (?) a split. For example:
>>
>> TestTable,0031841132,126412 column=info:regioninfo, timestamp=1264470099796, value=REGION
=> {NAME => 'TestTa
>>  1653303                     ble,0031841132,1264121653303', STARTKEY => '0031841132',
ENDKEY => '0032057774',
>>                             ENCODED => 644500792, OFFLINE => true, SPLIT =>
true, TABLE => {{NAME => 'TestTab
>>                             le', FAMILIES => [{NAME => 'info', VERSIONS =>
'3', COMPRESSION => 'NONE', TTL =>
>>                              '2147483647', BLOCKSIZE => '65536', IN_MEMORY =>
'false', BLOCKCACHE => 'true'}]
>>                             }}
>>  TestTable,0031841132,126412 column=info:server, timestamp=1264121655148, value=10.129.68.214:60020
>>  1653303
>>  TestTable,0031841132,126412 column=info:serverstartcode, timestamp=1264121655148,
value=1264109117245
>>  1653303
>>  TestTable,0031841132,126412 column=info:splitA, timestamp=1264470099796, value=\000\n0031978284\000\000\000\0
>>  1653303                     01&g\016\357\277\275\024\"TestTable,0031841132,1264449401108\000\n0031841132\000\
>>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004true\357\277\27
>>                             5\337\274#
>>  TestTable,0031841132,126412 column=info:splitB, timestamp=1264470099796, value=\000\n0032057774\000\000\000\0
>>  1653303                     01&g\016\357\277\275\024\"TestTable,0031978284,1264449401108\000\n0031978284\000\
>>                             000\000\005\tTestTable\000\000\000\000\000\002\000\000\000\aIS_ROOT\000\000\000\0
>>                             05false\000\000\000\aIS_META\000\000\000\005false\000\000\000\001\a\004info\000\0
>>                             00\000\a\000\000\000\vBLOOMFILTER\000\000\000\005false\000\000\000\vCOMPRESSION\0
>>                             00\000\000\004NONE\000\000\000\bVERSIONS\000\000\000\0013\000\000\000\003TTL\000\
>>                             000\000\n2147483647\000\000\000\tBLOCKSIZE\000\000\000\00565536\000\000\000\tIN_M
>>                             EMORY\000\000\000\005false\000\000\000\nBLOCKCACHE\000\000\000\004truej\357\277\2
>>                             75\312\262
>>
>> Any thoughts/ideas on what might be going on? Appreciate any input in helping understand
what might be going on.
>>
>
> There is no line after the above?  A region with a startkey of
> 0031841132?  Then its a dropped edit of .META.  The parent region --
> the one being split got updates which added the splitA and splitB and
> its offlining but the new region didn't get inserted?  The crash
> happened just at that time?
>
> It would be interesting to take a look at the regionserver logs from
> that time.  Please post if you have a moment so we can take a looksee.
>
> Above kinda thing is what the master rewrite is about moving state
> transitions up to zk so atomic over cluster moving regions through
> transitions rather than as here, a multi-row update might not all go
> through as things currently work.
>
> St.Ack
>
>> Regards,
>> Kannan
>>
>>
>>
>

Mime
View raw message