hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Kellerman <...@powerset.com>
Subject RE: Hbase corrupts data after reporting MSG_REPORT_CLOSE to master during compaction and split process
Date Tue, 09 Sep 2008 20:19:15 GMT
I think there is a problem, but maybe not the one we surmised. Are
there any lease timeout reports in your master log?

When a lease is timed out, all the regions being served by the region
server get reassigned almost immediately, which could be the cause of
a region being assigned to another server while the original server
still thought it had exclusive access to the region.

There are a couple of problems going on here if you see lease
timeouts. One is that the region server is not sending its heartbeat
soon enough (See HBASE-616 "We slept XXXXXX ms, ten times longer than
scheduled: 3000" happens frequently."). This can happen on both the
master and the region server meaning either the master does not
process the heartbeat until after the lease times out, or the region
server does not send the heartbeat until after the lease times out.

This can be caused by thread starvation due to either too many
runnable threads on a machine, not enough CPUs to handle the thread
load, or just a bad thread scheduler.

Hopefully, lease timeout will work better after HBase is integrated
with Zookeeper.

=====

In the normal case, the master will not reassign a region due to load
balancing until the region server reports that it has closed the
region:

Nothing happens with the mostLoadedRegions until it gets to
RegionManager.unassignSomeRegions which is called by
  RegionManager.assignRegions

In unassignSomeRegions, any regions selected are put into the
closingRegions Set and a MSG_REGION_CLOSE will get sent to the region
server. Candidates for assignment are only taken from the
unassignedRegions Map.

Not until the master receives a MSG_REPORT_CLOSE does any further
action take place on that region.

First the region is removed from the closingRegions Set.

If the region was being split, the HRegionInfo received by the master
will indicate that that the region is offline and split. In this case,
it does not get reassigned.

Otherwise it is added to the unassignedRegions Map and is now a
candidate for reassignment.

Mime
View raw message