hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Upgrading from 0.94 (CDH4) to 1.0 (CDH5)
Date Thu, 14 May 2015 18:33:40 GMT
Looks like there is a JIRA already:
HBASE-11843 MapReduce classes shouldn't be in hbase-server

Cheers

On Thu, May 14, 2015 at 10:41 AM, anil gupta <anilgupta84@gmail.com> wrote:

> +1 on moving MR related code to hbase-client or we can have a separate
> artifact called hbase-mapreduce
> I also have to include  hbase-server along with hbase-client in my project
> just because of this reason. And once we pull in hbase-server and build an
> uber jar. hbase-server pulls in a lots of unnecessary stuff.
> Note: my project is not related to migration from 0.94 to 1.0. But, i am
> supporting the argument for moving MR code in client or a separate
> artifact.
>
> On Thu, May 14, 2015 at 9:43 AM, Bryan Beaudreault <
> bbeaudreault@hubspot.com
> > wrote:
>
> > Just an update here.  I've got something working locally that can run
> > against either a 0.94.17 hbase or a 1.0 hbase transparently.  I
> implemented
> > as laid out above, but there were a bunch of gotchas.  It helps that we
> > maintain our own fork of each version, as I needed to make some
> > supplemental changes in each version to make things easier.  I will do a
> > writeup with all of the gotchas later in the process.
> >
> > Next steps:
> >
> > - Convert server-side coprocessors
> > - Apply the same or similar shim logic to our TableInputFormat and other
> > mapreduce interfaces
> >
> > A couple notes for the devs:
> >
> > - I love that 1.0 has a separate hbase-client artifact.  Unfortunately
> the
> > TableInputFormat and other mapreduce classes live in hbase-server for
> some
> > reason.  So the end result is I basically need to pull the entire hbase
> > super-artifact into my clients.  I may move these to hbase-client in my
> > local fork if that is possible.
> >
> > - There are a few places where you are statically calling
> > HBaseConfiguration.create().  This makes it hard for people who have a
> lot
> > of libraries built around HBase like us.  In our clients we inject
> > configuration properties from our own configuration servers to supplement
> > hbase-site/hbase-default.xml. When HBaseConfiguration.create() is called,
> > it disregards these changes.  In my local fork I hacked in a
> > LazyConfigurationHolder, which just keeps a static reference to a
> > Configuration, but has a setter.  This allows me to inject my customized
> > Configuration object into the hbase stack.
> >
> >  -- (For reference, the places you do this are, at least, ProtobufUtil
> and
> > ConnectionManager)
> >  -- Hadoop also does something like this in their UserGroupInformation
> > class, but they do provide a setConfiguration method.  Ideally there are
> no
> > static calls to create a Configuration, but this is an ok compromise
> where
> > necessary.
> >
> > I can put JIRAs in for these if it makes sense
> >
> >
> >
> > On Tue, May 5, 2015 at 10:48 PM, Bryan Beaudreault <
> > bbeaudreault@hubspot.com
> > > wrote:
> >
> > > Thanks for the response guys!
> > >
> > > You've done a review of HTI in 1.0 vs 0.94 to make sure we've not
> > >> mistakenly dropped anything you need? (I see that stuff has moved
> around
> > >> but HTI should have everything still from 0.94)
> > >
> > >
> > > Yea, so far so good for HTI features.
> > >
> > > Sounds like you have experience copying tables in background in a
> manner
> > >> that minimally impinges serving given you have dev'd your own in-house
> > >> cluster cloning tools?
> > >> You will use the time while tables are read-only to 'catch-up' the
> > >> difference between the last table copy and data that has come in
> since?
> > >
> > >
> > > Correct, we have some tools left over from our 0.92 to 0.94 upgrade,
> > which
> > > we've used for cluster copies.  It basically does an incremental distcp
> > by
> > > comparing the file length and md5 of each table in the target and
> source
> > > cluster, then only copies the diffs.  We can get very close to real
> time
> > > with this, then switch to read-only, do some flushes, and do one final
> > copy
> > > to catch up.  We have done this many times for various cluster moves.
> > >
> > > CDH4 has pb2.4.1 in it as opposed to pb2.5.0 in cdh5?
> > >
> > >
> > > Good to know, will keep this in mind! We already shade some of the
> > > dependencies of hbase such as guava, apache commons http, and joda.  We
> > > will do the same for protobuf.
> > >
> > >  Can you 'talk out loud' as you try stuff Bryan and if we can't
> > >> help highlevel, perhaps we can help on specifics.
> > >
> > >
> > > Gladly! I feel like I have a leg up since I've already survived the
> 0.92
> > > to 0.94 migration, so glad to share my experiences with this migration
> as
> > > well.  I'll update this thread as I move along.  I also plan to
> release a
> > > blog post on the ordeal once it's all said and done.
> > >
> > > We just created our initial shade of hbase.  I'm leaving tomorrow for
> > > HBaseCon, but plan on tackling and testing all of this next week once
> I'm
> > > back from SF.  If anyone is facing similar upgrade challenges I'd be
> > happy
> > > to compare notes.
> > >
> > > If your clients are interacting with HDFS then you need to go the route
> > of
> > >> shading around PB and its hard, but HBase-wise only HBase 0.98 and 1.0
> > use
> > >> PBs in the RPC protocol and it shouldn't be any problem as long as you
> > >> don't need security
> > >
> > >
> > > Thankfully we don't interact directly with the HDFS of hbase.  There is
> > > some interaction with the HDFS of our CDH4 hadoop clusters though.
> I'll
> > be
> > > experimenting with these incompatibilities soon and will post here.
> > > Hopefully I'll be able to separate them enough to not cause an issue.
> > > Thankfully we have not moved to secure HBase yet.  That's actually on
> the
> > > to-do list, but hoping to do it *after* the CDH upgrade.
> > >
> > > ---
> > >
> > > Thanks again guys.  I'm expecting this will be a drawn out process
> > > considering our scope, but will be happy to keep updates here as I
> > proceed.
> > >
> > > On Tue, May 5, 2015 at 10:31 PM, Esteban Gutierrez <
> esteban@cloudera.com
> > >
> > > wrote:
> > >
> > >> Just to a little bit to what StAck said:
> > >>
> > >> --
> > >> Cloudera, Inc.
> > >>
> > >>
> > >> On Tue, May 5, 2015 at 3:53 PM, Stack <stack@duboce.net> wrote:
> > >>
> > >> > On Tue, May 5, 2015 at 8:58 AM, Bryan Beaudreault <
> > >> > bbeaudreault@hubspot.com>
> > >> > wrote:
> > >> >
> > >> > > Hello,
> > >> > >
> > >> > > I'm about to start tackling our upgrade path for 0.94 to 1.0+.
We
> > >> have 6
> > >> > > production hbase clusters, 2 hadoop clusters, and hundreds of
> > >> > > APIs/daemons/crons/etc hitting all of these things.  Many of
these
> > >> > clients
> > >> > > hit multiple clusters in the same process.  Daunting to say the
> > least.
> > >> > >
> > >> > >
> > >> > Nod.
> > >> >
> > >> >
> > >> >
> > >> > > We can't take full downtime on any of these, though we can take
> > >> > read-only.
> > >> > > And ideally we could take read-only on each cluster in a staggered
> > >> > fashion.
> > >> > >
> > >> > > From a client perspective, all of our code currently assumes
an
> > >> > > HTableInterface, which gives me some wiggle room I think.  With
> that
> > >> in
> > >> > > mind, here's my current plan:
> > >> > >
> > >> >
> > >> > You've done a review of HTI in 1.0 vs 0.94 to make sure we've not
> > >> > mistakenly dropped anything you need? (I see that stuff has moved
> > around
> > >> > but HTI should have everything still from 0.94)
> > >> >
> > >> >
> > >> > >
> > >> > > - Shade CDH5 to something like org.apache.hadoop.cdh5.hbase.
> > >> > > - Create a shim implementation of HTableInterface.  This shim
> would
> > >> > > delegate to either the old cdh4 APIs or the new shaded CDH5
> classes,
> > >> > > depending on the cluster being talked to.
> > >> > > - Once the shim is in place across all clients, I will put each
> > >> cluster
> > >> > > into read-only (a client side config of ours), migrate data to
a
> new
> > >> CDH5
> > >> > > cluster, then bounce affected services so they look there
> instead. I
> > >> will
> > >> > > do this for each cluster in sequence.
> > >> > >
> > >> > >
> > >> > Sounds like you have experience copying tables in background in a
> > manner
> > >> > that minimally impinges serving given you have dev'd your own
> in-house
> > >> > cluster cloning tools?
> > >> >
> > >> > You will use the time while tables are read-only to 'catch-up' the
> > >> > difference between the last table copy and data that has come in
> > since?
> > >> >
> > >> >
> > >> >
> > >> > > This provides a great rollback strategy, and with our existing
> > >> in-house
> > >> > > cluster cloning tools we can minimize the read-only window to
a
> few
> > >> > minutes
> > >> > > if all goes well.
> > >> > >
> > >> > > There are a couple gotchas I can think of with the shim, which
I'm
> > >> hoping
> > >> > > some of you might have ideas/opinions on:
> > >> > >
> > >> > > 1) Since protobufs are used for communication, we will have to
> avoid
> > >> > > shading those particular classes as they need to match the
> > >> > > package/classnames on the server side.  I think this should be
> fine,
> > >> as
> > >> > > these are net-new, not conflicting with CDH4 artifacts.  Any
> > >> > > additions/concerns here?
> > >> > >
> > >> > >
> > >> > CDH4 has pb2.4.1 in it as opposed to pb2.5.0 in cdh5?
> > >> >
> > >>
> > >> If your clients are interacting with HDFS then you need to go the
> route
> > of
> > >> shading around PB and its hard, but HBase-wise only HBase 0.98 and 1.0
> > use
> > >> PBs in the RPC protocol and it shouldn't be any problem as long as you
> > >> don't need security (this is mostly because the client does a UGI in
> the
> > >> client and its easy to patch on both 0.94 and 1.0 to avoid to call
> UGI).
> > >> Another option is to move your application to asynchbase and it should
> > be
> > >> clever enough to handle both HBase versions.
> > >>
> > >>
> > >>
> > >> > I myself have little experience going a shading route so have little
> > to
> > >> > contribute. Can you 'talk out loud' as you try stuff Bryan and if
we
> > >> can't
> > >> > help highlevel, perhaps we can help on specifics.
> > >> >
> > >> > St.Ack
> > >> >
> > >>
> > >> cheers,
> > >> esteban.
> > >>
> > >
> > >
> >
>
>
>
> --
> Thanks & Regards,
> Anil Gupta
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message