hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Baranau <alex.barano...@gmail.com>
Subject Re: Row distribution
Date Wed, 25 Jul 2012 13:53:12 GMT
Hi Mohit,

1. When talking about particular table:

For viewing rows distribution you can check out how regions are
distributed. And each region defined by the start/stop key, so depending on
your key format, etc. you can see which records go into each region. You
can see the regions distribution in web ui as Adrien mentioned. It may also
be handy for you to query .META. table [1] which holds regions info.

In cases when you use random keys or when you just not sure how data is
distributed in key buckets (which are regions), you may also want to look
at HBase data on HDFS [2]. Since data is stored for each region separately,
you can see the size on the HDFS each one occupies.

2. When talking about whole cluster, it makes sense to use cluster
monitoring tool [3], to find out more about overall load distribution,
regions of multiple tables distribution, requests amount, and many more
such things.

And of course, you can use HBase Java API to fetch some data of the cluster
state as well. I guess you should start looking at it from HBaseAdmin class.

Alex Baranau
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -


hbase(main):001:0> scan '.META.', {LIMIT=>1, STARTROW=>"mytable,,"}

 column=info:regioninfo, timestamp=1341279432625, value=REGION => {NAME =>
'mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.', STARTKEY =>
'chicago', ENDKEY => 'new_york', ENCODED =>
fd61cd7ef426d2f233a4cd7e8b73845, TABLE => {{NAME => 'mytable', FAMILIES =>
COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE =>
'65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}

 column=info:server, timestamp=1341279432673, value=myserver:60020

 column=info:serverstartcode, timestamp=1341279432673, value=1341267474257

1 row(s) in 0.1980 seconds


ubuntu@ip-10-80-47-73:~$ sudo -u hdfs hadoop fs -du /hbase/mytable
Found 130 items
3397        hdfs://hbase.master/hbase/mytable
2682163424  hdfs://hbase.master/hbase/mytable
1038862956  hdfs://hbase.master/hbase/mytable
1039181555  hdfs://hbase.master/hbase/mytable
1076888812  hdfs://hbase.master/hbase/mytable

You can get data from JMX directly using any tool you like or use:
* Ganglia
* SPM monitoring (
* others

On Wed, Jul 25, 2012 at 1:59 AM, Adrien Mogenet <adrien.mogenet@gmail.com>wrote:

> From the web-interface, you can have such statistics when viewing the
> details of a table.
> You can also develop your own "balance viewer" through the HBase API (list
> of RS, regions, storeFiles, their size, etc.)
> On Wed, Jul 25, 2012 at 7:32 AM, Mohit Anchlia <mohitanchlia@gmail.com
> >wrote:
> > Is there an easy way to tell how my nodes are balanced and how the rows
> are
> > distributed in the cluster?
> >
> --
> Adrien Mogenet
> http://www.mogenet.me

Alex Baranau
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message