hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jim Kellerman <...@powerset.com>
Subject RE: Hbase corrupts data after reporting MSG_REPORT_CLOSE to master during compaction and split process
Date Tue, 09 Sep 2008 15:56:25 GMT
Comments inline below:
> -----Original Message-----
> From: Cosmin Lehene [mailto:clehene@adobe.com]
> Sent: Tuesday, September 09, 2008 7:25 AM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: Hbase corrupts data after reporting MSG_REPORT_CLOSE to master
> during compaction and split process
> Hi,
> I managed to reproduce the corruption and also have full debug logs, but
> first I'll explain the whys and hows of the bug and also how I think it can
> be fixed.
> ( I'm going to send my takeaways on how we managed to insert 300GB in less
> then 6 hours on a 5 node cluster and also some advice/issues in another
> mail.)
> Next assumptions are based on understanding the actual code (don't worry if I
> didn't get them all right, please read the entire mail).
> - The master _assigns_ a region to a server by sending a MSG_REGION_OPEN
> - On heartbeat region servers report the current load and a list of MLR -
> most loaded regions (in fact just a list of first N online regions).
> - Upon opening a newly assigned region, a region server will try to compact
> and split that region.
> - The region is NOT marked offline when compaction starts
> - The region is marked OFFLINE:true, SPLIT:true during a SPLIT
> Our scenario goes this way:
> Master (M) assigns region A to region server R1
> R1 starts compaction and split of A
> R1 on heart beat sends it's load and a list of MLR that contains A

This list should only be a list of open regions and should not include any regions in the
process of being opened. In addition, the region server should attach a number of MSG_REPORT_PROCESS_OPEN
to the heartbeat (one for each region being opened). This should prevent the master from reassigning
those regions.

> M decides to reassign the extra regions and sends a MSG_CLOSE_REGION A to R1
> R1 finishes the compaction and splits A into A1 and A2 (A1 has the same start
> key as A)

If, in fact, the region server is including regions that are not completely open in the load
list, this is a bug.

> M assigns A a to R2
> R2 starts compaction and split of A
> R2 finishes the compaction and splits A into A_clone_1 and A_clone_2
> (A_clone_1 has the same start key as A and IMPORTANT the same start key as
> A1)

Whenever two region servers start working on the same reason, chaos ensues. It is rare that
corruption *will not* happen in this case.

> Now we get A1 and A_clone_1 almost identical starting with the same key.
> Cluster is corrupted. We should care less what happens next. But the ideea is
> that they are both in .META.
> I figured several places where this could be avoided and I'm going to state a
> few disjoint questions. Both Master and Region could be held responsible in
> my opinion but I guess it's a matter of architectural philosophy. Please note
> that any of these question would be a starting point for the fix.
> - Why when getting a MSG_CLOSE_REGION A, the region server doesn't abort the
> current compact split operation to leaving A in the original state and close
> it immediately?

MSG_CLOSE_REGION is sent for various different purposes. Maybe, if the master has timed out
the region server, it should send something like MSG_ABORT_OPEN.

> - Why doesn't a region server DELETE a region after a SPLIT?( I guess it
> could be offline by then and it's not himself to decide that, but still..)

The reason splits are fast is because the two children use the parent until they do a compaction.
Thus the parent region must remain around until both children are no longer using the parent
region. The master then garbage collects the parent.

> - Why when assigning a region to a new region server the master doesn't check
> the region status? It might be splitting or already split. I guess this would
> need a new state.

The master does check to see if a region is split or offline and will not assign it. This
information is only available after the split is complete.

> - Why when opening/compacting/splitting a region server doesn't check if the
> region is OFFLINE:true or SPLIT:true?

A region server should never receive an open message for a split or offline region. When the
region server is told to open a region, it assumes it has exclusive rights to all the files
of the region.

> I have the logs available, however they are pretty large and I might need to
> clean them a little, but I could make them available if that's really needed.
> However I think the scenario and questions might be enough for a bug and a
> fix.
> Thanks,
> Cosmin

View raw message