Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: message received from 54.76.25.247 which
 is an MX secondary for user@hbase.apache.org)
MIME-Version: 1.0
In-Reply-To: 
 <CALPvCiAMf_n8C4w+thbB2Ei_Y+1gg-WPJ5_LokTEuysW0NwUbg@mail.gmail.com>
References: 
 <CANZDn9vjSrOooZz4eGQ0_oSoxdoW86mJe-UtfZS7Ngw9AipWTQ@mail.gmail.com>
 <CADcMMgFnVrkMGePKiPJZ=TywdbLV5dEFD+D8O+ZNKgXVxCH+CQ@mail.gmail.com>
 <CALPvCiAMf_n8C4w+thbB2Ei_Y+1gg-WPJ5_LokTEuysW0NwUbg@mail.gmail.com>
From: Bryan Beaudreault <bbeaudreault@hubspot.com>
Date: Tue, 5 May 2015 22:48:07 -0400
Message-ID: 
 <CANZDn9tVDfv1QekZOWBoH75jAc85+OKXCTdePD3hqE5pugvsrg@mail.gmail.com>
Subject: Re: Upgrading from 0.94 (CDH4) to 1.0 (CDH5)
To: "user@hbase.apache.org" <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=001a1130d45e6872da051560d46d

--001a1130d45e6872da051560d46d
Content-Type: text/plain; charset=UTF-8

Thanks for the response guys!

You've done a review of HTI in 1.0 vs 0.94 to make sure we've not
> mistakenly dropped anything you need? (I see that stuff has moved around
> but HTI should have everything still from 0.94)


Yea, so far so good for HTI features.

Sounds like you have experience copying tables in background in a manner
> that minimally impinges serving given you have dev'd your own in-house
> cluster cloning tools?
> You will use the time while tables are read-only to 'catch-up' the
> difference between the last table copy and data that has come in since?


Correct, we have some tools left over from our 0.92 to 0.94 upgrade, which
we've used for cluster copies.  It basically does an incremental distcp by
comparing the file length and md5 of each table in the target and source
cluster, then only copies the diffs.  We can get very close to real time
with this, then switch to read-only, do some flushes, and do one final copy
to catch up.  We have done this many times for various cluster moves.

CDH4 has pb2.4.1 in it as opposed to pb2.5.0 in cdh5?


Good to know, will keep this in mind! We already shade some of the
dependencies of hbase such as guava, apache commons http, and joda.  We
will do the same for protobuf.

 Can you 'talk out loud' as you try stuff Bryan and if we can't
> help highlevel, perhaps we can help on specifics.


Gladly! I feel like I have a leg up since I've already survived the 0.92 to
0.94 migration, so glad to share my experiences with this migration as
well.  I'll update this thread as I move along.  I also plan to release a
blog post on the ordeal once it's all said and done.

We just created our initial shade of hbase.  I'm leaving tomorrow for
HBaseCon, but plan on tackling and testing all of this next week once I'm
back from SF.  If anyone is facing similar upgrade challenges I'd be happy
to compare notes.

If your clients are interacting with HDFS then you need to go the route of
> shading around PB and its hard, but HBase-wise only HBase 0.98 and 1.0 use
> PBs in the RPC protocol and it shouldn't be any problem as long as you
> don't need security


Thankfully we don't interact directly with the HDFS of hbase.  There is
some interaction with the HDFS of our CDH4 hadoop clusters though.  I'll be
experimenting with these incompatibilities soon and will post here.
Hopefully I'll be able to separate them enough to not cause an issue.
Thankfully we have not moved to secure HBase yet.  That's actually on the
to-do list, but hoping to do it *after* the CDH upgrade.

---

Thanks again guys.  I'm expecting this will be a drawn out process
considering our scope, but will be happy to keep updates here as I proceed.

On Tue, May 5, 2015 at 10:31 PM, Esteban Gutierrez <esteban@cloudera.com>
wrote:

> Just to a little bit to what StAck said:
>
> --
> Cloudera, Inc.
>
>
> On Tue, May 5, 2015 at 3:53 PM, Stack <stack@duboce.net> wrote:
>
> > On Tue, May 5, 2015 at 8:58 AM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com>
> > wrote:
> >
> > > Hello,
> > >
> > > I'm about to start tackling our upgrade path for 0.94 to 1.0+. We have
> 6
> > > production hbase clusters, 2 hadoop clusters, and hundreds of
> > > APIs/daemons/crons/etc hitting all of these things.  Many of these
> > clients
> > > hit multiple clusters in the same process.  Daunting to say the least.
> > >
> > >
> > Nod.
> >
> >
> >
> > > We can't take full downtime on any of these, though we can take
> > read-only.
> > > And ideally we could take read-only on each cluster in a staggered
> > fashion.
> > >
> > > From a client perspective, all of our code currently assumes an
> > > HTableInterface, which gives me some wiggle room I think.  With that in
> > > mind, here's my current plan:
> > >
> >
> > You've done a review of HTI in 1.0 vs 0.94 to make sure we've not
> > mistakenly dropped anything you need? (I see that stuff has moved around
> > but HTI should have everything still from 0.94)
> >
> >
> > >
> > > - Shade CDH5 to something like org.apache.hadoop.cdh5.hbase.
> > > - Create a shim implementation of HTableInterface.  This shim would
> > > delegate to either the old cdh4 APIs or the new shaded CDH5 classes,
> > > depending on the cluster being talked to.
> > > - Once the shim is in place across all clients, I will put each cluster
> > > into read-only (a client side config of ours), migrate data to a new
> CDH5
> > > cluster, then bounce affected services so they look there instead. I
> will
> > > do this for each cluster in sequence.
> > >
> > >
> > Sounds like you have experience copying tables in background in a manner
> > that minimally impinges serving given you have dev'd your own in-house
> > cluster cloning tools?
> >
> > You will use the time while tables are read-only to 'catch-up' the
> > difference between the last table copy and data that has come in since?
> >
> >
> >
> > > This provides a great rollback strategy, and with our existing in-house
> > > cluster cloning tools we can minimize the read-only window to a few
> > minutes
> > > if all goes well.
> > >
> > > There are a couple gotchas I can think of with the shim, which I'm
> hoping
> > > some of you might have ideas/opinions on:
> > >
> > > 1) Since protobufs are used for communication, we will have to avoid
> > > shading those particular classes as they need to match the
> > > package/classnames on the server side.  I think this should be fine, as
> > > these are net-new, not conflicting with CDH4 artifacts.  Any
> > > additions/concerns here?
> > >
> > >
> > CDH4 has pb2.4.1 in it as opposed to pb2.5.0 in cdh5?
> >
>
> If your clients are interacting with HDFS then you need to go the route of
> shading around PB and its hard, but HBase-wise only HBase 0.98 and 1.0 use
> PBs in the RPC protocol and it shouldn't be any problem as long as you
> don't need security (this is mostly because the client does a UGI in the
> client and its easy to patch on both 0.94 and 1.0 to avoid to call UGI).
> Another option is to move your application to asynchbase and it should be
> clever enough to handle both HBase versions.
>
>
>
> > I myself have little experience going a shading route so have little to
> > contribute. Can you 'talk out loud' as you try stuff Bryan and if we
> can't
> > help highlevel, perhaps we can help on specifics.
> >
> > St.Ack
> >
>
> cheers,
> esteban.
>

--001a1130d45e6872da051560d46d--