hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mohit Anchlia <mohitanch...@gmail.com>
Subject Re: Row distribution
Date Wed, 25 Jul 2012 23:54:56 GMT
On Wed, Jul 25, 2012 at 6:53 AM, Alex Baranau <alex.baranov.v@gmail.com>wrote:

> Hi Mohit,
>
> 1. When talking about particular table:
>
> For viewing rows distribution you can check out how regions are
> distributed. And each region defined by the start/stop key, so depending on
> your key format, etc. you can see which records go into each region. You
> can see the regions distribution in web ui as Adrien mentioned. It may also
> be handy for you to query .META. table [1] which holds regions info.
>
> In cases when you use random keys or when you just not sure how data is
> distributed in key buckets (which are regions), you may also want to look
> at HBase data on HDFS [2]. Since data is stored for each region separately,
> you can see the size on the HDFS each one occupies.
>
> I did a scan and the data looks like as pasted below. It appears all my
writes are going to just one server. My keys are of this type
[0-9]:[current timestamp]. Number between 0-9 is generated randomly. I
thought by having this random number I'll be able to place my keys on
multiple nodes. How should I approach this such that I am able to use other
nodes as well?



 SESSION_TIMELINE1,,1343074465420.5831bbac53e59 column=info:regioninfo,
timestamp=1343170773523, value=REGION => {NAME =>
'SESSION_TIMELINE1,,1343074465420.5831bbac53e591c609918c0e2d7da7
 1c609918c0e2d7da7bf.                           bf.', STARTKEY => '',
ENDKEY => '', ENCODED => 5831bbac53e591c609918c0e2d7da7bf, TABLE => {{NAME
=> 'SESSION_TIMELINE1', FAMILIES => [{NAM
                                                E => 'S_T_MTX', BLOOMFILTER
=> 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ', VERSIONS => '1',
TTL => '2147483647', BLOCKSIZE => '
                                                65536', IN_MEMORY =>
'false', BLOCKCACHE => 'true'}]}}
 SESSION_TIMELINE1,,1343074465420.5831bbac53e59 column=info:server,
timestamp=1343178912655, value=dsdb3.:60020
 1c609918c0e2d7da7bf.

> 2. When talking about whole cluster, it makes sense to use cluster
> monitoring tool [3], to find out more about overall load distribution,
> regions of multiple tables distribution, requests amount, and many more
> such things.
>
> And of course, you can use HBase Java API to fetch some data of the cluster
> state as well. I guess you should start looking at it from HBaseAdmin
> class.
>
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>
> [1]
>
> hbase(main):001:0> scan '.META.', {LIMIT=>1, STARTROW=>"mytable,,"}
> ROW
> COLUMN+CELL
>
>
>  mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.
>  column=info:regioninfo, timestamp=1341279432625, value=REGION => {NAME =>
> 'mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.', STARTKEY =>
> 'chicago', ENDKEY => 'new_york', ENCODED =>
> fd61cd7ef426d2f233a4cd7e8b73845, TABLE => {{NAME => 'mytable', FAMILIES =>
> [{NAME => 'job', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
> COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE =>
> '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
>
>
>
>  mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.
>  column=info:server, timestamp=1341279432673, value=myserver:60020
>
>
>  mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.
>  column=info:serverstartcode, timestamp=1341279432673, value=1341267474257
>
>
> 1 row(s) in 0.1980 seconds
>
> [2]
>
> ubuntu@ip-10-80-47-73:~$ sudo -u hdfs hadoop fs -du /hbase/mytable
> Found 130 items
> 3397        hdfs://hbase.master/hbase/mytable
> /02925d3c335bff7e273f392324f16dca
> 2682163424  hdfs://hbase.master/hbase/mytable
> /03231b8ae2b73317c4858b1a85c09ad2
> 1038862956  hdfs://hbase.master/hbase/mytable
> /04f911571593e931a9a3d9e2a6616236
> 1039181555  hdfs://hbase.master/hbase/mytable
> /0a177633196cae7b158836181d69dc0f
> 1076888812  hdfs://hbase.master/hbase/mytable
> /0d52fc477c41a9a236803234d44c7c06
>
> [3]
> You can get data from JMX directly using any tool you like or use:
> * Ganglia
> * SPM monitoring (
> http://sematext.com/spm/hbase-performance-monitoring/index.html)
> * others
>
>
> On Wed, Jul 25, 2012 at 1:59 AM, Adrien Mogenet <adrien.mogenet@gmail.com
> >wrote:
>
> > From the web-interface, you can have such statistics when viewing the
> > details of a table.
> > You can also develop your own "balance viewer" through the HBase API
> (list
> > of RS, regions, storeFiles, their size, etc.)
> >
> > On Wed, Jul 25, 2012 at 7:32 AM, Mohit Anchlia <mohitanchlia@gmail.com
> > >wrote:
> >
> > > Is there an easy way to tell how my nodes are balanced and how the rows
> > are
> > > distributed in the cluster?
> > >
> >
> >
> >
> > --
> > Adrien Mogenet
> > 06.59.16.64.22
> > http://www.mogenet.me
> >
>
>
>
> --
> Alex Baranau
> ------
> Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
> Solr
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message