Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D7BD317552 for ; Wed, 6 May 2015 02:49:05 +0000 (UTC) Received: (qmail 40553 invoked by uid 500); 6 May 2015 02:49:03 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 40484 invoked by uid 500); 6 May 2015 02:49:03 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 40473 invoked by uid 99); 6 May 2015 02:49:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 May 2015 02:49:03 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: message received from 54.76.25.247 which is an MX secondary for user@hbase.apache.org) Received: from [54.76.25.247] (HELO mx1-eu-west.apache.org) (54.76.25.247) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 06 May 2015 02:48:37 +0000 Received: from mail-ig0-f169.google.com (na3sys009aog125.obsmtp.com [74.125.149.153]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id B5A9B2142F for ; Wed, 6 May 2015 02:48:35 +0000 (UTC) Received: from mail-ig0-f169.google.com ([209.85.213.169]) (using TLSv1) by na3sys009aob125.postini.com ([74.125.148.12]) with SMTP ID DSNKVUmA/A66RKzrSiBCECulBuOL7A2YwKnj@postini.com; Tue, 05 May 2015 19:48:35 PDT Received: by mail-ig0-f169.google.com with SMTP id lo3so5215146igb.0 for ; Tue, 05 May 2015 19:48:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=f7wMaYy1eYt1SwgHJKw+K7wbYsSgZTPczbNwNVIQsMo=; b=PaB6f46zfnMNHtFyEjSBTnpKfIZ/0IyF+YnJnVz1y5Gm8oLNgqih/KTeCFmMRoFb0h XdfNZGU6RSCatb3iUmv1ZJM/XZqN6CNqH/r1Mdazen/XAUutF80H3H4x6h+x1iwOACu3 +J4dv3mf8k+xo1ztmcrfv22p6t5yKmluGY+Z5fLQnR3O65aXHkc3urzb7I0KJJ1Eg13O RkFOuuKl1F5VziYKtzCMCUpqi8sobD9c5NLx0r7mogGMQqlSifTG84ICfZz6+Xvh6nAp 8hu1UWtAP3rwiPD4MR5qV1mqvYusWfa/ORhU+DiMfW5QbmDN9rHDKB4A7arhnj+Y8MDz e/GQ== X-Gm-Message-State: ALoCoQnEHLTKoclKASSjtrCmmfn5FDLMjfnvMLt/2ii3HnIjYPUx7g69eN13ui5Z/Wxi1S3JnZZaD+qm9DcCGNX7nA0XN8SHHAfP+kCu+5ueyQVp5G+oJUIXHIekxYiTKWVGJLXlN1iBQT6XtYOGHfpQMZHxgwZoYQ== X-Received: by 10.50.142.67 with SMTP id ru3mr5674279igb.16.1430880508326; Tue, 05 May 2015 19:48:28 -0700 (PDT) X-Received: by 10.50.142.67 with SMTP id ru3mr5674269igb.16.1430880508197; Tue, 05 May 2015 19:48:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.36.67.9 with HTTP; Tue, 5 May 2015 19:48:07 -0700 (PDT) In-Reply-To: References: From: Bryan Beaudreault Date: Tue, 5 May 2015 22:48:07 -0400 Message-ID: Subject: Re: Upgrading from 0.94 (CDH4) to 1.0 (CDH5) To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a1130d45e6872da051560d46d X-Virus-Checked: Checked by ClamAV on apache.org --001a1130d45e6872da051560d46d Content-Type: text/plain; charset=UTF-8 Thanks for the response guys! You've done a review of HTI in 1.0 vs 0.94 to make sure we've not > mistakenly dropped anything you need? (I see that stuff has moved around > but HTI should have everything still from 0.94) Yea, so far so good for HTI features. Sounds like you have experience copying tables in background in a manner > that minimally impinges serving given you have dev'd your own in-house > cluster cloning tools? > You will use the time while tables are read-only to 'catch-up' the > difference between the last table copy and data that has come in since? Correct, we have some tools left over from our 0.92 to 0.94 upgrade, which we've used for cluster copies. It basically does an incremental distcp by comparing the file length and md5 of each table in the target and source cluster, then only copies the diffs. We can get very close to real time with this, then switch to read-only, do some flushes, and do one final copy to catch up. We have done this many times for various cluster moves. CDH4 has pb2.4.1 in it as opposed to pb2.5.0 in cdh5? Good to know, will keep this in mind! We already shade some of the dependencies of hbase such as guava, apache commons http, and joda. We will do the same for protobuf. Can you 'talk out loud' as you try stuff Bryan and if we can't > help highlevel, perhaps we can help on specifics. Gladly! I feel like I have a leg up since I've already survived the 0.92 to 0.94 migration, so glad to share my experiences with this migration as well. I'll update this thread as I move along. I also plan to release a blog post on the ordeal once it's all said and done. We just created our initial shade of hbase. I'm leaving tomorrow for HBaseCon, but plan on tackling and testing all of this next week once I'm back from SF. If anyone is facing similar upgrade challenges I'd be happy to compare notes. If your clients are interacting with HDFS then you need to go the route of > shading around PB and its hard, but HBase-wise only HBase 0.98 and 1.0 use > PBs in the RPC protocol and it shouldn't be any problem as long as you > don't need security Thankfully we don't interact directly with the HDFS of hbase. There is some interaction with the HDFS of our CDH4 hadoop clusters though. I'll be experimenting with these incompatibilities soon and will post here. Hopefully I'll be able to separate them enough to not cause an issue. Thankfully we have not moved to secure HBase yet. That's actually on the to-do list, but hoping to do it *after* the CDH upgrade. --- Thanks again guys. I'm expecting this will be a drawn out process considering our scope, but will be happy to keep updates here as I proceed. On Tue, May 5, 2015 at 10:31 PM, Esteban Gutierrez wrote: > Just to a little bit to what StAck said: > > -- > Cloudera, Inc. > > > On Tue, May 5, 2015 at 3:53 PM, Stack wrote: > > > On Tue, May 5, 2015 at 8:58 AM, Bryan Beaudreault < > > bbeaudreault@hubspot.com> > > wrote: > > > > > Hello, > > > > > > I'm about to start tackling our upgrade path for 0.94 to 1.0+. We have > 6 > > > production hbase clusters, 2 hadoop clusters, and hundreds of > > > APIs/daemons/crons/etc hitting all of these things. Many of these > > clients > > > hit multiple clusters in the same process. Daunting to say the least. > > > > > > > > Nod. > > > > > > > > > We can't take full downtime on any of these, though we can take > > read-only. > > > And ideally we could take read-only on each cluster in a staggered > > fashion. > > > > > > From a client perspective, all of our code currently assumes an > > > HTableInterface, which gives me some wiggle room I think. With that in > > > mind, here's my current plan: > > > > > > > You've done a review of HTI in 1.0 vs 0.94 to make sure we've not > > mistakenly dropped anything you need? (I see that stuff has moved around > > but HTI should have everything still from 0.94) > > > > > > > > > > - Shade CDH5 to something like org.apache.hadoop.cdh5.hbase. > > > - Create a shim implementation of HTableInterface. This shim would > > > delegate to either the old cdh4 APIs or the new shaded CDH5 classes, > > > depending on the cluster being talked to. > > > - Once the shim is in place across all clients, I will put each cluster > > > into read-only (a client side config of ours), migrate data to a new > CDH5 > > > cluster, then bounce affected services so they look there instead. I > will > > > do this for each cluster in sequence. > > > > > > > > Sounds like you have experience copying tables in background in a manner > > that minimally impinges serving given you have dev'd your own in-house > > cluster cloning tools? > > > > You will use the time while tables are read-only to 'catch-up' the > > difference between the last table copy and data that has come in since? > > > > > > > > > This provides a great rollback strategy, and with our existing in-house > > > cluster cloning tools we can minimize the read-only window to a few > > minutes > > > if all goes well. > > > > > > There are a couple gotchas I can think of with the shim, which I'm > hoping > > > some of you might have ideas/opinions on: > > > > > > 1) Since protobufs are used for communication, we will have to avoid > > > shading those particular classes as they need to match the > > > package/classnames on the server side. I think this should be fine, as > > > these are net-new, not conflicting with CDH4 artifacts. Any > > > additions/concerns here? > > > > > > > > CDH4 has pb2.4.1 in it as opposed to pb2.5.0 in cdh5? > > > > If your clients are interacting with HDFS then you need to go the route of > shading around PB and its hard, but HBase-wise only HBase 0.98 and 1.0 use > PBs in the RPC protocol and it shouldn't be any problem as long as you > don't need security (this is mostly because the client does a UGI in the > client and its easy to patch on both 0.94 and 1.0 to avoid to call UGI). > Another option is to move your application to asynchbase and it should be > clever enough to handle both HBase versions. > > > > > I myself have little experience going a shading route so have little to > > contribute. Can you 'talk out loud' as you try stuff Bryan and if we > can't > > help highlevel, perhaps we can help on specifics. > > > > St.Ack > > > > cheers, > esteban. > --001a1130d45e6872da051560d46d--