hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Robertson <timrobertson...@gmail.com>
Subject Tools for balancing a poorly distributed table
Date Sat, 21 Apr 2018 16:48:21 GMT
Hi folks

Recently I've seen a few clusters with badly unbalanced tables, including
some with many regions in the KB size. It seems it is easy to overlook this
in ops.

Understandably SimpleNormalizer does a fairly poor job at addressing this -
takes a long time, doesn't aggressively merge small regions, eagerly splits
well sized regions if many small ones exist etc. It works well if enabled
on a well set up table though.

I have been exploring approaches to tackle:
  1) determining region splits for a one time bulk load into a presplit
table[1] and
  2) approaches to fixing really badly skewed tables.

I was thinking of creating a Jira which I'd assign to myself to add a
utility tool that would:

  a) read the HFiles for a table (optionally performing a MC first to
discard old edits)
  b) analyze the block headers and determine splits that would take you
back to regions at e.g. 80% hbase.hregion.max.filesize
  c) create a new pre-split table
  d) run a table copy (or bulkload?)

Does such a thing exist anywhere and I'm just missing it, or does anyone
know of a better approach please?

Thoughts, criticism, requests very welcome.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message