hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Baranau <alex.barano...@gmail.com>
Subject Re: Row distribution
Date Thu, 26 Jul 2012 14:16:18 GMT
Looks like you have only one region in your table. Right?

If you want your writes to be distributed from the start (without waiting
for HBase to fill table enough to split it in many regions), you should
pre-split your table. In your case you can pre-split table with 10 regions
(just an example, you can define more), with start keys: "", "1", "2", ...,
"9" [1].

Btw, since you are salting your keys to achieve distribution, you might
also find this small lib helpful which implements most of the stuff for you
[2].

Hope this helps.

Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

[1]

    byte[][] splitKeys = new byte[9][];
    // the first region starting with empty key will be created
automatically
    for (int i = 1; i < splitKeys.length; i++) {
      splitKeys[i] = Bytes.toBytes(String.valueOf(i));
    }

    HBaseAdmin admin = new HBaseAdmin(conf);
    admin.createTable(tableDescriptor, splitKeys);

[2]
https://github.com/sematext/HBaseWD
http://blog.sematext.com/2012/04/09/hbasewd-avoid-regionserver-hotspotting-despite-writing-records-with-sequential-keys/

On Wed, Jul 25, 2012 at 7:54 PM, Mohit Anchlia <mohitanchlia@gmail.com>wrote:

> On Wed, Jul 25, 2012 at 6:53 AM, Alex Baranau <alex.baranov.v@gmail.com
> >wrote:
>
> > Hi Mohit,
> >
> > 1. When talking about particular table:
> >
> > For viewing rows distribution you can check out how regions are
> > distributed. And each region defined by the start/stop key, so depending
> on
> > your key format, etc. you can see which records go into each region. You
> > can see the regions distribution in web ui as Adrien mentioned. It may
> also
> > be handy for you to query .META. table [1] which holds regions info.
> >
> > In cases when you use random keys or when you just not sure how data is
> > distributed in key buckets (which are regions), you may also want to look
> > at HBase data on HDFS [2]. Since data is stored for each region
> separately,
> > you can see the size on the HDFS each one occupies.
> >
> > I did a scan and the data looks like as pasted below. It appears all my
> writes are going to just one server. My keys are of this type
> [0-9]:[current timestamp]. Number between 0-9 is generated randomly. I
> thought by having this random number I'll be able to place my keys on
> multiple nodes. How should I approach this such that I am able to use other
> nodes as well?
>
>
>
>  SESSION_TIMELINE1,,1343074465420.5831bbac53e59 column=info:regioninfo,
> timestamp=1343170773523, value=REGION => {NAME =>
> 'SESSION_TIMELINE1,,1343074465420.5831bbac53e591c609918c0e2d7da7
>  1c609918c0e2d7da7bf.                           bf.', STARTKEY => '',
> ENDKEY => '', ENCODED => 5831bbac53e591c609918c0e2d7da7bf, TABLE => {{NAME
> => 'SESSION_TIMELINE1', FAMILIES => [{NAM
>                                                 E => 'S_T_MTX', BLOOMFILTER
> => 'NONE', REPLICATION_SCOPE => '0', COMPRESSION => 'GZ', VERSIONS => '1',
> TTL => '2147483647', BLOCKSIZE => '
>                                                 65536', IN_MEMORY =>
> 'false', BLOCKCACHE => 'true'}]}}
>  SESSION_TIMELINE1,,1343074465420.5831bbac53e59 column=info:server,
> timestamp=1343178912655, value=dsdb3.:60020
>  1c609918c0e2d7da7bf.
>
> > 2. When talking about whole cluster, it makes sense to use cluster
> > monitoring tool [3], to find out more about overall load distribution,
> > regions of multiple tables distribution, requests amount, and many more
> > such things.
> >
> > And of course, you can use HBase Java API to fetch some data of the
> cluster
> > state as well. I guess you should start looking at it from HBaseAdmin
> > class.
> >
> > Alex Baranau
> > ------
> > Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch
> -
> > Solr
> >
> > [1]
> >
> > hbase(main):001:0> scan '.META.', {LIMIT=>1, STARTROW=>"mytable,,"}
> > ROW
> > COLUMN+CELL
> >
> >
> >  mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.
> >  column=info:regioninfo, timestamp=1341279432625, value=REGION => {NAME
> =>
> > 'mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.', STARTKEY =>
> > 'chicago', ENDKEY => 'new_york', ENCODED =>
> > fd61cd7ef426d2f233a4cd7e8b73845, TABLE => {{NAME => 'mytable', FAMILIES
> =>
> > [{NAME => 'job', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0',
> > COMPRESSION => 'NONE', VERSIONS => '1', TTL => '2147483647', BLOCKSIZE
=>
> > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}}
> >
> >
> >
> >  mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.
> >  column=info:server, timestamp=1341279432673, value=myserver:60020
> >
> >
> >  mytable,,1341279432683.8fd61cd7ef426d2f233a4cd7e8b73845.
> >  column=info:serverstartcode, timestamp=1341279432673,
> value=1341267474257
> >
> >
> > 1 row(s) in 0.1980 seconds
> >
> > [2]
> >
> > ubuntu@ip-10-80-47-73:~$ sudo -u hdfs hadoop fs -du /hbase/mytable
> > Found 130 items
> > 3397        hdfs://hbase.master/hbase/mytable
> > /02925d3c335bff7e273f392324f16dca
> > 2682163424  hdfs://hbase.master/hbase/mytable
> > /03231b8ae2b73317c4858b1a85c09ad2
> > 1038862956  hdfs://hbase.master/hbase/mytable
> > /04f911571593e931a9a3d9e2a6616236
> > 1039181555  hdfs://hbase.master/hbase/mytable
> > /0a177633196cae7b158836181d69dc0f
> > 1076888812  hdfs://hbase.master/hbase/mytable
> > /0d52fc477c41a9a236803234d44c7c06
> >
> > [3]
> > You can get data from JMX directly using any tool you like or use:
> > * Ganglia
> > * SPM monitoring (
> > http://sematext.com/spm/hbase-performance-monitoring/index.html)
> > * others
> >
> >
> > On Wed, Jul 25, 2012 at 1:59 AM, Adrien Mogenet <
> adrien.mogenet@gmail.com
> > >wrote:
> >
> > > From the web-interface, you can have such statistics when viewing the
> > > details of a table.
> > > You can also develop your own "balance viewer" through the HBase API
> > (list
> > > of RS, regions, storeFiles, their size, etc.)
> > >
> > > On Wed, Jul 25, 2012 at 7:32 AM, Mohit Anchlia <mohitanchlia@gmail.com
> > > >wrote:
> > >
> > > > Is there an easy way to tell how my nodes are balanced and how the
> rows
> > > are
> > > > distributed in the cluster?
> > > >
> > >
> > >
> > >
> > > --
> > > Adrien Mogenet
> > > 06.59.16.64.22
> > > http://www.mogenet.me
> > >
> >
> >
> >
> > --
> > Alex Baranau
> > ------
> > Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch
> -
> > Solr
> >
>



-- 
Alex Baranau
------
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message