hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From st...@duboce.net
Subject Re: Review Request: hbase-3267 close_region shell command breaks region
Date Thu, 25 Nov 2010 00:32:23 GMT


> On 2010-11-24 15:45:32, Jonathan Gray wrote:
> > This is great.  I like this much better than hacking up the master transition code.
> > 
> > My main concern is around the exact semantics of assign/unassign (and close).  I
think we need to do good javadoc on the HBA methods to describe how you would use these or
at least a bit about their behavior.  assign() just does an assign, but unassign() actually
clears stuff out.  It seems doing a close() behind the masters back, then asking the master
to assign that region, should not work... but it does?

Well, my notion is that user shouldn't be doing these manual messings any more.  Fixup stuff
is now for hbck to do.

Yes, close of a region is done w/o master's involvement.  Rare would you do it.  Yes, an assign
will assign a region EVEN IF ALREADY assigned.  Messing in here can get you in trouble.  I
was able to manufacture some ugly conditions -- a stuck region trying to assign same server
over and over -- but then unassign with a force now clears out RIT and does the right thing....
i.e. we have enough tools to hang ourselves on new master but also the tools to undo.


> On 2010-11-24 15:45:32, Jonathan Gray wrote:
> > trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java, line 980
> > <http://review.cloudera.org/r/1250/diff/1/?file=17648#file17648line980>
> >
> >     Is there an open_region?  This assign() goes through the master, so what is
the opposite of close_region which doesn't go through the master?
> >     
> >     Doesn't close_region now put the master in a bad state, so it won't expect an
assignment to be done on a region which it thinks is already assigned?  There is a force on
unassign() but not on assign().
> >     
> >     In the old master, for HBCK, I added a hook in to the master to clear the in-memory
state for a region.  To deal with dupe assignment, I did silent close_regions and then cleared
the in-memory state.  Then I triggered a new assignment.

No open_region.  Someone can add that later if wanted.  Otherwise, use move to place region
on specific server.

On close_region, yes, puts master in bad state but you'd only do close_region when doing fix
up of some whack condition.  I was tempted to just remove these commands but since we don't
know what states new master could put us in, I'll leave them in for now.

I'll add force to assign so same as unassign.


Regards what you did for old master hbck, you could call close_regions then an unassign with
a force would clear memory and get the region assigned elsewhere.

But hbck should be doing this.  Not a user manually, not unless things are really hosed.


> On 2010-11-24 15:45:32, Jonathan Gray wrote:
> > trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java, line 138
> > <http://review.cloudera.org/r/1250/diff/1/?file=17649#file17649line138>
> >
> >     this is awesome javadoc.  is there somewhere else we can put this rather than
in just the move() API?  Maybe in the HBA class comment or something?  Somewhere we can reference
in other javadocs about what a regionname is

I moved the interface doc out to HBA as per your suggestion.


> On 2010-11-24 15:45:32, Jonathan Gray wrote:
> > trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 709
> > <http://review.cloudera.org/r/1250/diff/1/?file=17651#file17651line709>
> >
> >     So you're supposed to call move instead of open_region?  Or why the change in
move() though this looks good.

Just added it as something you might want to do.  unassign does same thing really.  I could
back it out.


> On 2010-11-24 15:45:32, Jonathan Gray wrote:
> > trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 994
> > <http://review.cloudera.org/r/1250/diff/1/?file=17651#file17651line994>
> >
> >     Why META and not in-memory state?  Once you hit assign() you rely on the in-memory
state anyways?

I only have a region server name, not an HRI which is what the inmemory state is keyed by.
  I could iterate the Map I suppose but then I'm thinking it may have been cleared from inmemory
state.


> On 2010-11-24 15:45:32, Jonathan Gray wrote:
> > trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 996
> > <http://review.cloudera.org/r/1250/diff/1/?file=17651#file17651line996>
> >
> >     on assign we just do the assignment, but below on unassign() we first clear
existing plans and clear from RIT.  why the difference.

I made it so we only clear state if force is added to the unassign.


> On 2010-11-24 15:45:32, Jonathan Gray wrote:
> > trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java, line 1011
> > <http://review.cloudera.org/r/1250/diff/1/?file=17651#file17651line1011>
> >
> >     is this necessary?  should the unassign method taking force deal with anything
needed to "force" it?

Its needed for case you called assign multiple times. First assign works.  Second one will
be stuck in loop where you ask the region open but the regionserver will say its already open
and abort the open.  You are stuck in this loop unless RIT gets cleared.


> On 2010-11-24 15:45:32, Jonathan Gray wrote:
> > trunk/src/main/ruby/hbase/admin.rb, line 390
> > <http://review.cloudera.org/r/1250/diff/1/?file=17652#file17652line390>
> >
> >     zk didn't work?  why is this removed?

Because we have ./bin/hbase zkcli now which is better way of doing this zk interaction.


> On 2010-11-24 15:45:32, Jonathan Gray wrote:
> > trunk/src/main/ruby/shell/commands/assign.rb, line 26
> > <http://review.cloudera.org/r/1250/diff/1/?file=17654#file17654line26>
> >
> >     whitespace.  and what exactly are the semantics of this?  what if region is
already assigned?
> >     
> >     we should document somewhere more specifically what the behavior is of these
methods if we're going to expose them to the client and the shell.  neither place really describes
what this means and i can imagine users will be doing lots of foot shooting with tools like
this.
> >     
> >     more importantly, though, i'm trying to understand the use cases for these.
 if it's to unbreak stuff, it's not clear to me how exactly you would use it given that the
master will reject certain operations in the wrong order.

If already assigned it will reassign regardless.  This is a fix up facility for expert use
only (I updated the help to say this more explicitly).

Well, using these new commands you can break things and then unbreak them too.


> On 2010-11-24 15:45:32, Jonathan Gray wrote:
> > trunk/src/main/ruby/shell/commands/close_region.rb, line 28
> > <http://review.cloudera.org/r/1250/diff/1/?file=17657#file17657line28>
> >
> >     Why would you use close and not unassign/assign/move?  It's because close is
done silently?  Should say that if that's the distinction.
> >     
> >     Is this comment saying you can use unassign or move after you issue close? 
or instead of?

I updated the help.   Added 'caution' and for experts only.

Like I say, I wanted to removed these things altogether but my guess is that one day we'll
need them -- at least while hbck is lacking and while all failure modes of new master are
not yet known.


> On 2010-11-24 15:45:32, Jonathan Gray wrote:
> > trunk/src/main/ruby/shell/commands/unassign.rb, line 30
> > <http://review.cloudera.org/r/1250/diff/1/?file=17661#file17661line30>
> >
> >     this doesn't use encoded region name?
> >     
> >     is move then different from the other methods?

Yes, move is different from the others.  It tries to make this clear in its documentation.
  The Map move uses is keyed by encoded region name.  Assign and unassign go get the HRI from
.META. by using passed regionname.


- stack


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1250/#review1975
-----------------------------------------------------------


On 2010-11-24 15:09:00, stack wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> http://review.cloudera.org/r/1250/
> -----------------------------------------------------------
> 
> (Updated 2010-11-24 15:09:00)
> 
> 
> Review request for hbase and Jonathan Gray.
> 
> 
> Summary
> -------
> 
> So, things are different in the new master.  Close region should close region.  Not close
and then reopen.  To close and reopen elsewhere, thats an unassign or a move (both of which
were missing from shell but which are added in this patch).  I fixed the close so that its
a close that does not touch zk... the region is just closed on the regionserver.  No going
to zk makes it so the close no longer makes for complaint.  Close is dangerous though in that
the region is now permanently offline (I updated the close help to explain this is so).  
To address it being permanently offline, I added a new assign to the shell. 
> 
> While in here, I removed commands that no longer make senses such as enable_region and
disable_region. 
> 
> M src/main/java/org/apache/hadoop/hbase/master/HMaster.java
>   Change move implementation so can pass an empty host.
>   Empty host means move to random location rather than
>   explicit server.
>   Added assign, unassign
> M src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
>   (clearRegionPlan): Added.
> M src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java
>   Improved move javadoc.  Added assign, unassign.
> M src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java
>   Improved javadoc. Added assign and unassign.
> M src/main/ruby/hbase/admin.rb
>   Added balancer, balance_switch, assign, unassign, removed
>   zk, enable_region and disable_region (the latter make no sense
>   anymore now disable/enable is done differently).
> D src/main/ruby/shell/commands/zk.rb
> A src/main/ruby/shell/commands/assign.rb
> A src/main/ruby/shell/commands/balance_switch.rb
> D src/main/ruby/shell/commands/disable_region.rb
> A src/main/ruby/shell/commands/balancer.rb
> A src/main/ruby/shell/commands/unassign.rb
> D src/main/ruby/shell/commands/enable_region.rb
> A src/main/ruby/shell/commands/move.rb
> M src/main/ruby/shell/commands/close_region.rb
>   Fixed up help
> M src/main/ruby/shell.rb
>   Added and removed commands.  
> 
> 
> This addresses bug hbase-3267.
>     http://issues.apache.org/jira/browse/hbase-3267
> 
> 
> Diffs
> -----
> 
>   trunk/src/main/java/org/apache/hadoop/hbase/client/HBaseAdmin.java 1038768 
>   trunk/src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java 1038768 
>   trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 1038768 
>   trunk/src/main/java/org/apache/hadoop/hbase/master/HMaster.java 1038768 
>   trunk/src/main/ruby/hbase/admin.rb 1038768 
>   trunk/src/main/ruby/shell.rb 1038768 
>   trunk/src/main/ruby/shell/commands/assign.rb PRE-CREATION 
>   trunk/src/main/ruby/shell/commands/balance_switch.rb PRE-CREATION 
>   trunk/src/main/ruby/shell/commands/balancer.rb PRE-CREATION 
>   trunk/src/main/ruby/shell/commands/close_region.rb 1038768 
>   trunk/src/main/ruby/shell/commands/disable_region.rb 1038768 
>   trunk/src/main/ruby/shell/commands/enable_region.rb 1038768 
>   trunk/src/main/ruby/shell/commands/move.rb PRE-CREATION 
>   trunk/src/main/ruby/shell/commands/unassign.rb PRE-CREATION 
>   trunk/src/main/ruby/shell/commands/zk.rb 1038768 
> 
> Diff: http://review.cloudera.org/r/1250/diff
> 
> 
> Testing
> -------
> 
> I tested shell here on my little cluster.
> 
> 
> Thanks,
> 
> stack
> 
>


Mime
View raw message