hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Hsieh <...@cloudera.com>
Subject Re: schema optimisation - go for multiple tables, rows or column families?
Date Mon, 09 Jan 2012 11:51:39 GMT
Hi Tom,

In the case you describe -- two HTables -- there is no guarantee that they
will end up going to the same region server.  If you have multiple tables,
these are different regions and which can (and most likely will) be
distributed to different regionserver machines.  The fact that both tables
use the same rowkeys doesn't matter.

If you use (2), the single table with column family approach, they would be
located in the same region and thus the same regionserver.

Given your concerns, and depending on your read patterns (do you do a lot
of scans of only the meta data?), I'd probably take approach (2) or (3).


On Mon, Jan 9, 2012 at 2:01 AM, Tom <fivemiletom@gmail.com> wrote:

> Hello,
> I got most, but not all, answers about schemas from the HBase Book and the
> "Definite Guide".
> Let's say there is a single row key and I use this key to add to two
> tables, one row each (case (1)).
> Could someone please confirm that even though the tables are different,
> based on the key, this data will end up in the same or at least adjacent
> regions? (I.e. my hbase client has to deal with two HTable instances but
> only one region server needs to be looked up)?
> Thank you,
> Tom
> Background:
> I have two types of data: meta data (low volume) and measurement data
> (high volume); and I get requests coming in where, based on an ID, I need
> my HBase client to be able to access both metadata and measurement data for
> this ID quickly. I want to reduce communication overhead (lookups, number
> of tcp connections etc).
> In regards to dealing with the two types of data in Hbase, I see these
> three design choices, which one to go for?
> (1) Multiple tables - single key - single column family
> (2) Single table - single key - multiple column families (the HBase Book
> advises against that in section 6.2).
> (3) Single table - multiple keys (all made in such a way that they will be
> co-located and system wide hot spots are avoided) - single column family

// Jonathan Hsieh (shay)
// Software Engineer, Cloudera
// jon@cloudera.com

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message