hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karthik Ranganathan <kranganat...@facebook.com>
Subject RE: HBASE-2312 discussion
Date Wed, 17 Mar 2010 17:21:51 GMT
Loved the "Juliet" terminology as well :).

@Todd: I agree we will need something like #2 or especially #3 in other places.

Looks like we have a consensus - I will update the JIRA.


-----Original Message-----
From: Todd Lipcon [mailto:todd@cloudera.com] 
Sent: Tuesday, March 16, 2010 10:09 PM
To: hbase-dev@hadoop.apache.org
Subject: Re: HBASE-2312 discussion

On Tue, Mar 16, 2010 at 8:59 PM, Stack <stack@duboce.net> wrote:

> On Tue, Mar 16, 2010 at 5:08 PM, Todd Lipcon <todd@cloudera.com> wrote:
> >
> > What do you think about the trick of making the RS do a ZK sync before
> any
> > meta op? This forces it to take at most one action after it's been
> > terminated.
> >
> ... where meta op is open of new WAL log?
> How would this work?  RS would note in ZK the name of the WAL its
> about to open before it did it?  If the RS then does a "Juliet" --
[haha, love this terminology!]

> i.e. goes into a GC pause death-like coma -- on revivial, it'll go to
> open the WAL but master will have already done so, and so it'll fail?
I was actually referring to the explicit sync call in ZK:

The javadoc isn't that clear, but the way I understand this call is that it
makes sure the client's view of the world is up-to-date with respect to the
ZK leader at the beginning of the sync call.

The "note" box at the bottom of this section also explains it pretty well:

If we insert this between any transitions, I think we can ensure that the
region server will only do at most one operation after losing its lease.
This means that whole "chasing the log" thing is unnecessary.

> @Karthik "I am a little nervous about the master backing off on
> detecting the RS's progress - because the RS has already lost its zk
> lease."
> Yes.  The RS will have had its 'shut-yourself-down' flag set on
> loss-of-lease so is on its way out.  Its not going to revive so its
> logs need recovering.
> @Kannan "Option #1 seems easy to reason about and simple to implement.
> Can we go ahead with that if there is no major objection?"
> Fine by me.

Fine by me as well. I think we'll need solutions like 2 or 3 other places,
but for this one #1 seems to work (I'll continue to think if there are any
holes in our logic)


Todd Lipcon
Software Engineer, Cloudera

View raw message