Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5556717222 for ; Fri, 10 Apr 2015 16:39:38 +0000 (UTC) Received: (qmail 68430 invoked by uid 500); 10 Apr 2015 16:31:34 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 68361 invoked by uid 500); 10 Apr 2015 16:31:34 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 68349 invoked by uid 99); 10 Apr 2015 16:31:34 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Apr 2015 16:31:34 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of nidmgg@gmail.com designates 209.85.213.176 as permitted sender) Received: from [209.85.213.176] (HELO mail-ig0-f176.google.com) (209.85.213.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Apr 2015 16:31:29 +0000 Received: by igblo3 with SMTP id lo3so2215268igb.1 for ; Fri, 10 Apr 2015 09:31:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=CY3Wgf480twliw69oFJe6wELARze7rl18jIAv8pQYVs=; b=JLKSBD0AAFfDvJNvPd1LsQfFsdLY28/MtgnqwKVWYAbpL53yJvO8p3+kveJLBTw1JE UCs3F9JyHn61XRJ+6UHNgjFMDsqhB03iDalc6PMLEf3YUKNsNsMVKfTR5OOAOdfIbwtq WvyQ8U2pbpGZXqUi2Zo9I670ZaJ6gSagAU0VceTPHabAFspzrZIwdIXbv7PDH3I3y1PP xdYBh6/OB4xru6U/NPoDyvUFFT+znFU2aENlaf54KbpecTag1PvxK/sq4A5yWjcXmEhu uccG0KIBw1fjMnGo/H07RuQm84wDroF5UWexpkJ7e+5q2fPttqCUx4DtgQ9WKuxOub6b /BBw== MIME-Version: 1.0 X-Received: by 10.107.133.27 with SMTP id h27mr3964955iod.31.1428683469480; Fri, 10 Apr 2015 09:31:09 -0700 (PDT) Received: by 10.64.65.71 with HTTP; Fri, 10 Apr 2015 09:31:09 -0700 (PDT) In-Reply-To: References: Date: Fri, 10 Apr 2015 09:31:09 -0700 Message-ID: Subject: Re: HBase region assignment by range? From: Demai Ni To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=001a113ec1f2b1ea160513614a0c X-Virus-Checked: Checked by ClamAV on apache.org --001a113ec1f2b1ea160513614a0c Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Nick, thanks. I will look into Phoenix. Heard about Phoenix for quite a while, but haven't seriously played with it yet. Considering Phoenix's target user base, it is no surprise for Phoenix team to consider something around secondary index, complex JOIN. etc. Anoop, thanks for your input too. custom LB will be the way to go... On Wed, Apr 8, 2015 at 4:11 PM, Nick Dimiduk wrote: > Your needs (and use case?) looks a lot like the local secondary index wor= k > happening around Phoenix. > > On Wed, Apr 8, 2015 at 11:50 AM, Anoop John wrote= : > > > bq.while the region can surely split when more data added-on, but can > HBase > > keep the new regions still on the same regionServer according to the > > predefined bounary? > > > > You need custom LB for that.. If there, it is possible to restrict > > > > -Anoop- > > > > > > On Thu, Apr 9, 2015 at 12:09 AM, Demai Ni wrote: > > > > > hi, Guys, > > > > > > many thanks for your quick response. > > > > > > First, Let me share what I am looking at, which may help to clarify t= he > > > intention and answer a few of questions. I am working on a POC to bri= ng > > in > > > MPP style of OLAP on Hadoop, and looking for whether it is feasible t= o > > have > > > HBase as Datastore. With HBase, I'd like to take advantage of 1) OLTP > > > capability ; 2) many filters ; 3) in-cluster replica and > between-clusters > > > replication. I am currently using TPCH schema for this POC, and also > > > consider star-schema. Since it is a POC, I can pretty much define my > > rules > > > and set limitations as it fits. :-) > > > > > > Why doesn't this(presplit) work for you? > > > > > > The reason is that presplit won't guarantee the regions stay at the > > > pre-assigned regionServer. Let's say I have a very large table and a > very > > > small table with different data distribution, even with the same > presplit > > > value. HBase won't ensure the same range of data located on the same > > > physical node. Unless we have a custom LB mentioned by @Anoop and > > @esteban. > > > Is my understanding correct? BTW, I will look into HBASE-10576 to see > > > whether it fits my needs. > > > > > > Is your table staic? > > > > > > > while I can make it static for POC purpose, but I will use this > > limitation, > > > as I'd like the HBase for its OLTP feature. So besides the 'static' > > HFile, > > > need HLOGs on the same local node too. But again, I would worry about > the > > > 'static' HFile for now > > > > > > However as you add data to the table, those regions will eventually > > split. > > > > > > while the region can surely split when more data added-on, but can > HBase > > > keep the new regions still on the same regionServer according to the > > > predefined bounary? I will worry about hotspot-issue late. that is th= e > > > beauty of doing POC instead of production. :-) > > > > > > What you=E2=80=99re suggesting is that as you do a region scan, you= =E2=80=99re going to > > the > > > > other table and then try to fetch a row if it exists. > > > > > > > Yes, something like that. I am currently using the client API: scan() > > with > > > start and end key. Since I know my start and end keys, and with the > > > local-read feature, the scan should be local-READ. With some > > > statistics(such as which one is larger table) and a hash join > > > operation(which I need to implement), the join will work with > not-too-bad > > > performance. Again, it is POC, so I won't worry about the situation > that > > a > > > regionServer hosts too much data(hotspot). But surely, a LB should be > > used > > > before putting into production if it ever occurs. > > > > > > either the second table should be part of the first table in the same > CF > > or > > > > as a separate CF > > > > > > > I am not sure whether it will work for a situation of a large table v= s > a > > > small table. The data of the small table has to be duplicated in many > > > places, and a update of the small table can be costly. > > > > > > Demai > > > > > > > > > On Wed, Apr 8, 2015 at 10:24 AM, Esteban Gutierrez < > esteban@cloudera.com > > > > > > wrote: > > > > > > > +1 Anoop. > > > > > > > > Thats pretty much the only way right now if you need a custom > > balancing. > > > > This balancer doesn't have to live in the HMaster and can be invoke= d > > > > externally (there are caveats of doing that, when a RS die but work= s > ok > > > so > > > > far). A long term solution for your the problem you are trying to > solve > > > is > > > > HBASE-10576 by tweaking it a little. > > > > > > > > cheers, > > > > esteban. > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Cloudera, Inc. > > > > > > > > > > > > On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel < > > michael_segel@hotmail.com > > > > > > > > wrote: > > > > > > > > > Is your table staic? > > > > > > > > > > If you know your data and your ranges, you can do it. However as > you > > > add > > > > > data to the table, those regions will eventually split. > > > > > > > > > > The other issue that you brought up is that you want to do =E2=80= =98local=E2=80=99 > > > joins. > > > > > > > > > > Simple single word response=E2=80=A6 don=E2=80=99t. > > > > > > > > > > Longer response.. > > > > > > > > > > You=E2=80=99re suggesting that the tables in question share the r= ow key in > > > > > common. Ok=E2=80=A6 why? Are they part of the same record? > > > > > How is the data normally being used? > > > > > > > > > > Have you looked at column families? > > > > > > > > > > The issue is that joins are expensive. What you=E2=80=99re sugges= ting is > that > > > as > > > > > you do a region scan, you=E2=80=99re going to the other table and= then try > to > > > > fetch > > > > > a row if it exists. > > > > > So its essentially for each row in the scan, try a get() which wi= ll > > > > almost > > > > > double the cost of your fetch. Then you have to decide how to do = it > > > > > locally. Are you really going to write a coprocessor for this? > > (Hint: > > > If > > > > > this is a common thing. Then either the second table should be pa= rt > > of > > > > the > > > > > first table in the same CF or as a separate CF. You need to rethi= nk > > > your > > > > > schema.) > > > > > > > > > > Does this make sense? > > > > > > > > > > > On Apr 7, 2015, at 7:05 PM, Demai Ni wrote: > > > > > > > > > > > > hi, folks, > > > > > > > > > > > > I have a question about region assignment and like to clarify > some > > > > > through. > > > > > > > > > > > > Let's say I have a table with rowkey as "row00000 ~ row30000" o= n > a > > 4 > > > > node > > > > > > hbase cluster, is there a way to keep data partitioned by range > on > > > each > > > > > > node? for example: > > > > > > > > > > > > node1: <=3Drow10000 > > > > > > node2: row10001~row20000 > > > > > > node3: row20001~row30000 > > > > > > node4: >row30000 > > > > > > > > > > > > And even when one of the node become hotspot, the boundary won'= t > be > > > > > crossed > > > > > > unless manually doing a load balancing? > > > > > > > > > > > > I looked at presplit: { SPLITS =3D> ['row100','row200','row300'= ] } > , > > > but > > > > > > don't think it serves this purpose. > > > > > > > > > > > > BTW, a bit background. I am thinking to do a local join between > two > > > > > tables > > > > > > if both have same rowkey, and partitioned by range (or same has= h > > > > > > algorithm). If I can keep the join-key on the same node(aka > > > > > regionServer), > > > > > > the join can be handled locally instead of broadcast to all oth= er > > > > nodes. > > > > > > > > > > > > Thanks for your input. A couple pointers to blog/presentation > would > > > be > > > > > > appreciated. > > > > > > > > > > > > Demai > > > > > > > > > > The opinions expressed here are mine, while they may reflect a > > > cognitive > > > > > thought, that is purely accidental. > > > > > Use at your own risk. > > > > > Michael Segel > > > > > michael_segel (AT) hotmail.com > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > --001a113ec1f2b1ea160513614a0c--