Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B94A7DF1D for ; Tue, 28 Aug 2012 21:00:08 +0000 (UTC) Received: (qmail 20503 invoked by uid 500); 28 Aug 2012 21:00:08 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 20425 invoked by uid 500); 28 Aug 2012 21:00:08 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 20413 invoked by uid 99); 28 Aug 2012 21:00:07 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Aug 2012 21:00:07 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [209.85.214.169] (HELO mail-ob0-f169.google.com) (209.85.214.169) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Aug 2012 21:00:00 +0000 Received: by obhx4 with SMTP id x4so13981595obh.14 for ; Tue, 28 Aug 2012 13:59:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=Uv3eguzjorVgFPBA6td/AyVfGauAYSd2H1kVYsXcsjA=; b=BvQRiHZomxX4WPE5s8LpGFWMQLuttIg5gjC+9VIY6UCC9TvbsLoRwkiqlkrrxw7iXA BOn3wFr2nw31bwRbKT6DAFvEji7tUqjaCVYqFIZPNxWYM9dt7gU5Ub2NNWGVbDk5Q7YR A1f20Bjs29h9IRGu9a/p6eDQmHOfF9kd7i1U2M99NPJkcI43UKHcJTOOwbIzWkysrDpE M3TWqebl6ID/6mkbz278n4vXpOwzNoriBkL8Hj1S31j8CbMe4Yxlq9e0ARoNLhPCq4Zi VLGUmjcxn5AT7+wkkuqxTaAtQ51BOJHlpLBwbvBcruKY8HDVjkUy/HyfLzGebsSg9vB/ rceQ== MIME-Version: 1.0 Received: by 10.182.76.226 with SMTP id n2mr13656722obw.89.1346187579579; Tue, 28 Aug 2012 13:59:39 -0700 (PDT) Received: by 10.60.164.9 with HTTP; Tue, 28 Aug 2012 13:59:39 -0700 (PDT) X-Originating-IP: [2602:304:b25c:a20:95d9:604d:12ee:c445] In-Reply-To: <1346177552.57605.YahooMailNeo@web121702.mail.ne1.yahoo.com> References: <1346177552.57605.YahooMailNeo@web121702.mail.ne1.yahoo.com> Date: Tue, 28 Aug 2012 13:59:39 -0700 Message-ID: Subject: Re: Improving Coprocessor postSplit/postOpen synchronization From: Kevin Shin To: dev@hbase.apache.org, apurtell@apache.org Content-Type: multipart/alternative; boundary=f46d044632887b5eec04c859b8a1 X-Gm-Message-State: ALoCoQmCmxH0Cbn0Ti8hMUxBGsa2h4eJdFsrAPxGVqryV5NPaLxIQZLgNrmvMow4UsxNoe4CFgU0 --f46d044632887b5eec04c859b8a1 Content-Type: text/plain; charset=ISO-8859-1 Hello again everyone, Thanks for responding! I really appreciate all of the advice that's been given so far. :) Just to clarify Andrew do you have a prototype patch up that could potentially be worked on to either move postSplit() or add new hooks into the framework/are planning on submitting it sometime in the near future? I'd also love to get any feedback from the community about where to add the hook(s) but my thought was that we should have different levels of hooks within a split as Ramkrishna suggested. Perhaps two preSplits to accomodate for grabbing as well as a postSplit and a completeSplit? Giving a better abstraction would definitely help developers figure out how to deal with asynchronous calls to split, Put, and Delete. Thanks as always! Best, Kevin On Tue, Aug 28, 2012 at 11:12 AM, lars hofhansl wrote: > That approach sounds good to me. > > > > ----- Original Message ----- > From: Andrew Purtell > To: dev@hbase.apache.org > Cc: > Sent: Tuesday, August 28, 2012 3:05 AM > Subject: Re: Improving Coprocessor postSplit/postOpen synchronization > > Never mind, I went to look at the code. Should have done that first. > > Looking at 0.94 sources, in SplitTransaction, first we notify the master > that the split has happened, and wait for the master to process it (which > opens daughters), and then call up to the CP with the daughter regions as > arguments. > > I seem to remember that in my prototype patch for the CP framework, > postSplit notification let the CP know the split took place and allow it to > take actions before the master opened the daughters. In any event that's > not the code now, so it seems what you need here is for us to move the > postSplit upcall up prior to master notification or add another hook at > that location. > > On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell >wrote: > > > (from postSplit) > > > > > > On Tue, Aug 28, 2012 at 12:53 PM, Andrew Purtell >wrote: > > > >> What about writing a marker (a file) into the region at split (from > >> preSplit) which is then existence checked and read at open (postOpen)? > This > >> file would contain whatever indexing metadata is required. > >> > >> Also, splits are nearly instant because the daughters are created with > >> reference files to the parent, until a later compaction brings the data > >> from the parent over. Can you do the same with your indexes? Reason I > ask > >> is this notion of "ignoring" new data until indexes are available seems > >> undesirable. > >> > >> > >> On Mon, Aug 27, 2012 at 11:29 PM, Kevin Shin < > >> kevin.shin@thinkbiganalytics.com> wrote: > >> > >>> Hi everyone, > >>> > >>> A colleague and I were working with HBase coprocessors for secondary > >>> indexes and ran into an interesting problem regarding splits > >>> and synchronizing the corresponding parent/daughter regions. > >>> > >>> The goal with splits is to create two new daughter regions with the > >>> corresponding splits of the secondary indexes and lock these regions > such > >>> that Puts/Deletes that occur while postSplit is in progress will be > >>> queued > >>> up so we don't run into consistency issues. IE, if a delete gets called > >>> before a daughter region receives the split index, that delete would > >>> essentially have been ignored, so we would want to wait until postSplit > >>> is > >>> finished before running any new Puts/Deletes on the split regions. > >>> > >>> As of right now, the HBase coprocessors do not easily support a way to > >>> achieve this level of consistency in that there is no way to > distinguish > >>> a > >>> region being opened from a split or a regular open. If we could > >>> distinguish, we could open up the correct index from the start and > stall > >>> until postSplit is finished in the background in the event of a split. > I > >>> would thus like to propose a way to "lock" the daughter regions when > >>> postSplit is called. That is, when we open a daughter region from a > >>> split, > >>> we can pass in the parent region name alongside it (or Null if there is > >>> no > >>> parent) to distinguish a region being opened from a split or open. I am > >>> thinking about submitting a patch into JIRA but would greatly > appreciate > >>> any thoughts or suggestions for another solution to the problem or > >>> perhaps > >>> a better patch. I am using HBase 0.92 for development at this moment. > >>> > >>> Best, > >>> Kevin > >>> > >> > >> > >> > >> -- > >> Best regards, > >> > >> - Andy > >> > >> Problems worthy of attack prove their worth by hitting back. - Piet Hein > >> (via Tom White) > >> > >> > > > > > > -- > > Best regards, > > > > - Andy > > > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > > (via Tom White) > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) > > --f46d044632887b5eec04c859b8a1--