hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bijieshan <bijies...@huawei.com>
Subject Re: In search of a bug around splitting
Date Fri, 19 Aug 2011 06:56:45 GMT
One query about the rollback:
If the journal contains the entry of "PONR", it returns directly.
The regionserver should abort if rollback returns false. Right?  


On Fri, Aug 19, 2011 at 12:05 AM, Joseph Pallas
<joseph.pallas@oracle.com> wrote:
> The test program has multiple client threads, each of which is performing a stream of
operations (it's actually a custom workload running in the YCSB framework).  The program
is keeping track of data that was inserted by write operations, and subsequent read operations
only retrieve data that was previously written.  The read operation involves first doing
a HTableInterface.exists call on a row/cf/qual that is expected to exist.  It is this exists
call that we have seen fail.  When the failure occurs, the client reports an exception and
stops.  Then we examine the data using the HBase shell, and the item we were looking for
is there: the exists call should have succeeded.  Furthermore, the item has a timestamp that
shows it really was inserted several minutes previously-it was not inserted right around the
time of the failure (which might happen if there were a race condition of some sort in our

OK.  The exists call is rarely used I'd say which may be why you are
seeing something we don't.

> So, what is interesting is when we look at the log files for the region server, and at
the time this happens, the region involved is in the middle of a split. Also, the key we failed
on is greater than the split key.  After much reading of the code in SplitTransaction and
HRegionServer, I came up with a theory.
> When a region splits, daughter regions are created and the region is marked as offline/splitting
in META (by MetaEditor.offlineParentInMeta).  The daughter regions are brought online and
added to META by SplitTransaction.openDaughterRegion and HRegionServer.postOpenDeployTasks.
 Later, the META entry for the original region is cleaned up.  The two daughter regions
are managed in their own DaughterOpener thread.  This is where I am suspicious: if daughter
A's thread updates META before daughter B's thread does, then there's a window of time on
the client when HConnectionManager.locateRegionInMeta if looking for a key in daughter B will
see only daughter A.  The client, I believe, does not check end rows in META, so it will
think that daughter A is the region to handle the request.


> Now, the question is: are they any circumstances under which sending that request to
the wrong region (daughter A instead of daughter B) might yield incorrect results, instead
of an exception?  My gut says maybe, but my experiments have not yet managed to find it.

Well, we can do a transaction that involved mutliple rows.  Currently
(as I'm sure you know by now), the steps are:

1. close region (NSRE if anyone asks for the region after close)
2. offline region in edit (still NSRE'ing)
3. Open Daughters in parallel and then in parallel update .META.

We should add daughters, daughter B first, then daughter A, and then
offline parent?  If we do it in this sequence, if you are looking for
a row in daughter A, you'll get the parent still and then a NSRE
because its closed.... so you'll go back to .META. and then find
daughter A eventually.  If you are looking for a row in B and A is
online first, you'll think it has it when it doesn't... which would be

If we offline parent first and then add daughter B first... and we're
looking for row in daughter A, but its not online yet, we'll get
WrongRegionException which would be a blast from the past... something
we used to get in the old days but like polio, managed to eradicate

How does this sound Joe?  We could rig you a SplitTransaction to do
the above.  We could hack one up first and if it did away with your
issue, we'd then spend a bit of time making sure it rolled back
properly on fail (need to make sure rollback works properly).


View raw message