hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Esteban Gutierrez <este...@cloudera.com>
Subject Re: HBase region assignment by range?
Date Wed, 08 Apr 2015 17:24:23 GMT
+1 Anoop.

Thats pretty much the only way right now if you need a custom balancing.
This balancer doesn't have to live in the HMaster and can be invoked
externally (there are caveats of doing that, when a RS die but works ok so
far). A long term solution for your the problem you are trying to solve is
HBASE-10576 by tweaking it a little.

cheers,
esteban.





--
Cloudera, Inc.


On Wed, Apr 8, 2015 at 4:41 AM, Michael Segel <michael_segel@hotmail.com>
wrote:

> Is your table staic?
>
> If you know your data and your ranges, you can do it. However as you add
> data to the table, those regions will eventually split.
>
> The other issue that you brought up is that you want to do ‘local’ joins.
>
> Simple single word response… don’t.
>
> Longer response..
>
> You’re suggesting that the tables in question share the row key in
> common.  Ok… why? Are they part of the same record?
> How is the data normally being used?
>
> Have you looked at column families?
>
> The issue is that joins are expensive. What you’re suggesting is that as
> you do a region scan, you’re going to the other table and then try to fetch
> a row if it exists.
> So its essentially for each row in the scan, try a get() which will almost
> double the cost of your fetch. Then you have to decide how to do it
> locally. Are you really going to write a coprocessor for this?  (Hint: If
> this is a common thing. Then either the second table should be part of the
> first table in the same CF or as a separate CF. You need to rethink your
> schema.)
>
> Does this make sense?
>
> > On Apr 7, 2015, at 7:05 PM, Demai Ni <nidmgg@gmail.com> wrote:
> >
> > hi, folks,
> >
> > I have a question about region assignment and like to clarify some
> through.
> >
> > Let's say I have a table with rowkey as "row00000 ~ row30000" on a 4 node
> > hbase cluster, is there a way to keep data partitioned by range on each
> > node? for example:
> >
> > node1:  <=row10000
> > node2:  row10001~row20000
> > node3:  row20001~row30000
> > node4:  >row30000
> >
> > And even when one of the node become hotspot, the boundary won't be
> crossed
> > unless manually doing a load balancing?
> >
> > I looked at presplit: { SPLITS => ['row100','row200','row300'] } , but
> > don't think it serves this purpose.
> >
> > BTW, a bit background. I am thinking to do a local join between two
> tables
> > if both have same rowkey, and partitioned by range (or same hash
> > algorithm). If I can keep the join-key on the same node(aka
> regionServer),
> > the join can be handled locally instead of broadcast to all other nodes.
> >
> > Thanks for your input. A couple pointers to blog/presentation would be
> > appreciated.
> >
> > Demai
>
> The opinions expressed here are mine, while they may reflect a cognitive
> thought, that is purely accidental.
> Use at your own risk.
> Michael Segel
> michael_segel (AT) hotmail.com
>
>
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message