hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Eric Czech <eczec...@gmail.com>
Subject Index building process design
Date Thu, 12 Jul 2012 05:26:13 GMT
Hi everyone,

I have a general design question (apologies in advanced if this has
been asked before).

I'd like to build indexes off of a raw data store and I'm trying to
think of the best way to control processing so some part of my cluster
can still serve reads and writes without being affected heavily by the
index building process.

I get the sense that the typical process for this involves something
like the following:

1.  Dedicate one cluster for index building (let's call it the INDEX
cluster) and one for serving application reads on the indexes as well
as writes/reads on the raw data set (let's call it the MAIN cluster).
2.  Have the raw data set replicated from the MAIN cluster to the INDEX cluster.
3.  On the INDEX cluster, use the replicated raw data to constantly
rebuild indexes and copy the new versions to the MAIN cluster,
overwriting the old versions if necessary.

While conceptually simple, I can't help but wonder if it doesn't make
more sense to simply switch application reads / writes from one
cluster to another based on which one is NOT currently building
indexes (but still have the raw data set replicate master-master
between them).

To be more clear, I'm proposing doing this:

1.  Have two clusters, call them CLUSTER_1 and CLUSTER_2, and have the
raw data set replicated master-master between them.
2.  if CLUSTER_1 is currently rebuilding indexes, redirect all
application traffic to CLUSTER_2 including reads from the indexes as
well as writes to the raw data set (and vise-versa).

I know I'm not addressing a lot of details here but I'm just curious
if anyone has ever implemented something along these lines.

The main advantage to what I'm proposing would be not having to copy
potentially massive indexes across the network but at the cost of
having to deal with having clients not always read from the same
cluster (seems doable though).

Any advice would be much appreciated!


View raw message