Return-Path: X-Original-To: apmail-hbase-dev-archive@www.apache.org Delivered-To: apmail-hbase-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7CF731853C for ; Fri, 22 May 2015 21:03:36 +0000 (UTC) Received: (qmail 1134 invoked by uid 500); 22 May 2015 21:03:32 -0000 Delivered-To: apmail-hbase-dev-archive@hbase.apache.org Received: (qmail 1045 invoked by uid 500); 22 May 2015 21:03:32 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 1021 invoked by uid 99); 22 May 2015 21:03:32 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 May 2015 21:03:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id CD88A1A3159; Fri, 22 May 2015 21:03:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.879 X-Spam-Level: *** X-Spam-Status: No, score=3.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id BOoJbLYE0UNI; Fri, 22 May 2015 21:03:30 +0000 (UTC) Received: from mail-qk0-f169.google.com (mail-qk0-f169.google.com [209.85.220.169]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 12D124543A; Fri, 22 May 2015 21:03:30 +0000 (UTC) Received: by qkx62 with SMTP id 62so21883840qkx.3; Fri, 22 May 2015 14:01:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=zqXXQK1NWWtoRf+VBG10WaQOwvEKK13r51mgda+UHNo=; b=lKdNAvhkTjrVHrHpbe9D9sKVWYaPTdKrS9Yu1mik+LLZ2j+8XSmxt9pio8HccNp65I RmqPFfGrCdFrK5T4zKY72hK1Huvt+/Mu3C7KLoGAoj6gIh4ricAJJZz2AD3zflBVIFn7 2aK6a+g1UEJsATzJjeaRcnat+ALIuCDRlSlffDQC1b/t3t0fsHUuODervodwTcmhXG7g xbRAsvXgKuh9hNsuOLk1vetfYmlpaUD3AckX8VUtSeIxGIienkS/r/IPQw5VHKbb4fXj 3RXZQ3AxXKql6fKqNAJJRIwezkydRbS0ICfgdhKPbY61ZrpiuGZertBBo6zQw4F4xXpE 9NwQ== MIME-Version: 1.0 X-Received: by 10.140.87.97 with SMTP id q88mr13108126qgd.99.1432328513253; Fri, 22 May 2015 14:01:53 -0700 (PDT) Received: by 10.140.83.197 with HTTP; Fri, 22 May 2015 14:01:53 -0700 (PDT) In-Reply-To: References: <5FE0FB91-7BDB-434F-82F3-8619ADACFF79@gmail.com> Date: Fri, 22 May 2015 14:01:53 -0700 Message-ID: Subject: Re: avoiding hot spot for timestamp prefix key From: Vladimir Rodionov To: "dev@hbase.apache.org" Cc: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a113acd243c142a0516b1f892 --001a113acd243c142a0516b1f892 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable RegionSplitPolicy only allows you to customize split point (row key). All rows above this split point will go to the first daughter region, below - to the second. The answer on original question is - No, you can not have your custom policy based on a second part of a key. -Vlad On Fri, May 22, 2015 at 2:43 AM, Michael Segel wrote: > This is why I created HBASE-12853. > > So you don=E2=80=99t have to specify a custom split policy. > > Of course the simple solutions are often passed over because of NIH. ;-) > > To be blunt=E2=80=A6 You encapsulate the bucketing code so that you have = a single > API in to HBase regardless of the type of storage underneath. > KISS is maintained and you stop people from attempting to do stupid > things. (cc=E2=80=99ing dev@hbase) As a product owner, (read PMC / comm= itters) > you want to keep people from mucking about in the internals. While its > true that its open source, and you will have some who want to muck around= , > you also have to consider the corporate users who need something that is > reliable and less customized so that its supportable. This is the vendor= =E2=80=99s > dilemma. (hint Cloudera , Horton, IBM, MapR) You=E2=80=99re selling supp= ort to > HBase and if a customer starts to overload internals with their own code, > good luck in supporting it. This is why you do things like 12853 because > it makes your life easier. > > This isn=E2=80=99t a sexy solution. Its core engineering work. > > HTH > > -Mike > > > On May 22, 2015, at 4:22 AM, Shushant Arora > wrote: > > > > since custom split policy is based on second part i.e guid so key with > > first part as 2015-05-22 00:01:02 will be in which region how will that > be > > identified? > > > > > > On Fri, May 22, 2015 at 1:12 PM, Ted Yu wrote: > > > >> The custom split policy needs to respect the fact that timestamp is th= e > >> leading part of the rowkey. > >> > >> This would avoid the overlap you mentioned. > >> > >> Cheers > >> > >> > >> > >>> On May 21, 2015, at 11:55 PM, Shushant Arora < > shushantarora09@gmail.com> > >> wrote: > >>> > >>> guid change with every key, patterns is > >>> 2015-05-22 00:02:01#AB12EC77778888945 > >>> 2015-05-22 00:02:02#CD9870001234AB457 > >>> > >>> When we specify custom split algorithm , it may happen that keys of > same > >>> sorting order range say (1-7) lies in region R1 as well as in region > R2? > >>> Then how .META. table will make further lookups at read time, say I > >> search > >>> for key 3, then will it search in both the regions R1 and R2 ? > >>> > >>>> On Fri, May 22, 2015 at 10:48 AM, Ted Yu wrote= : > >>>> > >>>> Does guid change with every key ? > >>>> > >>>> bq. use second part of key > >>>> > >>>> I don't think so. Suppose first row in the parent region is > >>>> '1432104178817#321'. After split, the first row in first daughter > region > >>>> would still be '1432104178817#321'. Right ? > >>>> > >>>> Cheers > >>>> > >>>> On Thu, May 21, 2015 at 9:57 PM, Shushant Arora < > >> shushantarora09@gmail.com > >>>> wrote: > >>>> > >>>>> Can I avoid hotspot of region with custom region split policy in > hbase > >>>>>> 0.96 . > >>>>> > >>>>> Key is of the form timestamp#guid. > >>>>> So can I have custom region split policy and use second part of key > >> (i.e) > >>>>> guid as region split criteria and avoid hot spot?? > >>>> > >> > > --001a113acd243c142a0516b1f892--