hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vidhyashankar Venkataraman <vidhy...@yahoo-inc.com>
Subject Holes in our table.
Date Wed, 21 Sep 2011 17:07:33 GMT
I pored over a few JIRAs and this looks like an issue many of you might have seen already.
I am not sure. Do let me know if you guys have.

 We are currently having some problems with our cluster. I had pointed it out briefly in a
mail titled "Unassigned holes in tables".

We use a patched version of Hbase 0.90.0.
   Anyways, we have started to observe some (5 out of 30K regions to be exact) unassigned
regions in our tables. Certain observations we have made so far:

 1.  All these regions show up in META. They are all daughter regions after a split had happened.
The parent region shows up in META as offlined but having a serverinfo entry. And one daughter
region is assigned correctly. The other daughter region is the one in trouble. I have provided
an example below.
 2.  Please note that these are not the only splits that happened in this while. (Splits are
disabled by setting a large max file size but sometimes some of our regions do hit these sizes).
 3.  Call the parent region P. Assigned daughter region SPLIT1 and the unassigned region SPLIT2.
I can see that the master has assigned SPLIT1 from the logs but I see no trace of SPLIT2.
I do not see P also being assigned by the master.
 4.  These problems do not go away even after restarting Hbase (and once along with ZK) which
seems to bother me. Doesn't the master do the region assignment by scanning the META table
periodically? The master logs show no semblance of these regions. (I can see the other daughter
region though).

META scan of sample problem regions: (purely as an illustration).

column=info:regioninfo, timestamp=1315958212365, value=REGION => {NAME => 'WCC,BLAH1.X.Y',
STARTKEY => 'BLAH1', ENDKEY => 'BLAH3', ENCODED => bedc64dd9c56f8e072e745a3cbedc2d1,
OFFLINE => true, SPLIT => true, TABLE => {BLAH}}

 column=info:server, timestamp=1314142909658, value=<node1>:<port>

 column=info:serverstartcode, timestamp=1314142909658, value=1314141232997

DAUGHTER 1 info: (Note that there isnt any regionserver info and such)
column=info:splitA, timestamp=1314196650274, value=REGION => {NAME => 'WCC,BLAH1.X1.Y1',
STARTKEY => 'BLAH1', ENDKEY => 'BLAH2', ENCODED => 689e3ae01fe8f6ffe0591a0078fbe362,

DAUGHTER 2 info: (There is region server info here.)
column=info:regioninfo, timestamp=1315958212367, value=REGION => {NAME => 'WCC,BLAH2.X2.Y2',
=> 277007da50d04a9d71dad94de32ad876, TABLE => {BLAH}}

column=info:server, timestamp=1316205878425, value=<node2>:<port>

column=info:serverstartcode, timestamp=1316205878425, value=1316205773600


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message