hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Weiwei Xiong <xion...@gmail.com>
Subject Re: Data is always written to one node
Date Tue, 15 Mar 2011 04:12:25 GMT
On Mon, Mar 14, 2011 at 8:50 PM, Stack <stack@duboce.net> wrote:

> Data balancing on hdfs is different to region balancing across your
> nodes.  Maybe there is a bug in our balancer if there are only two
> nodes involved?
>
> If there is nothing to balance, because its' already balanced, it'll
> output this:
>
> 2011-03-09 00:40:35,537 INFO
> org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.
> servers=5 regions=1007 average=201.4 mostloaded=202 leastloaded=202
>
> ....else you will see:
>
>
> 2011-03-09 00:45:35,538 INFO
> org.apache.hadoop.hbase.master.LoadBalancer: Calculated a load balance
> in 1ms. Moving 1 regions off of 1 overloaded servers onto 0 less
> loaded servers
> 2011-03-09 00:45:35,538 INFO org.apache.hadoop.hbase.master.HMaster:
> balance
> hri=usertable,user362822713,1299624789204.1720a98e1a0709e9a401a8eb9d8436bc.,
> src=sv4borg230,61020,1299616745209,
> dest=sv4borg234,61020,1299616745224
> 2011-03-09 00:45:35,538 DEBUG
> org.apache.hadoop.hbase.master.AssignmentManager: Starting
> unassignment of region
> usertable,user362822713,1299624789204.1720a98e1a0709e9a401a8eb9d8436bc.
> (offlining)
> ...
>
>
> or there will be a message that it is skipping balancing because there
> are regions in movement already.
>
> Do you see none of the above?
>
> Yes I did see the latter messages in the master log. So I guess the region
is balanced across the cluster.
Actually I was expecting the REAL region data would also be balanced so that
I could ge better I/O balancing.
Now it seems to me that the data rebalancing is done during major compaction
only.


> In the shell you can run the balancer explicitly.
>
> hbase> balance
>
> Watch the master logs while this is happening.  What does it say?
>
> Typing 'balance' gives me invalid command. I am using 0.90.1. Is this
available in newer release?

>
> St.Ack
>
> On Mon, Mar 14, 2011 at 6:27 PM, Weiwei Xiong <xiongww@gmail.com> wrote:
> > On Mon, Mar 14, 2011 at 4:09 PM, Bill Graham <billgraham@gmail.com>
> wrote:
> >
> >> I hope I'm not hijacking the thread but I'm seeing what I think is a
> >> similar issue. About a week ago I loaded a bunch of data into a newly
> >> created table. It took about an hour and resulted in 12 regions being
> >> created on a single node. (Afterwards I remembered a conversation with
> >> JD where he described this behavior and how you could pre-create at
> >> least N regions where N is your number of nodes to get better
> >> distribution off the bat).
> >>
> >> Some following questions. Do we have to pre-create N regions on
> different
> > nodes to get better distribution? I ask this because I also noticed that
> > HBase prefer to always store new key-values on one node. Now I know
> > that we can do major compactions to rebalance the data. But it would be
> > better if the data could be stored on less-loaded nodes at time it is
> > inserted.
> > This makes I/O more balanced I guess.
> >
> >
> >
> >> Anyway, it's been about a week and all regions for the table are still
> >> on 1 node. I see messages like this in the logs every 5 minutes:
> >>
> >> 2011-03-14 15:59:03,148 INFO
> >> org.apache.hadoop.hbase.master.LoadBalancer: Skipping load balancing.
> >> servers=4 regions=62 average=15.5 mostloaded=16 leastloaded=16
> >>
> >> It seems the total regions are evenly balanced, but individual tables
> >> are not. Where should I look to troubleshoot why this table's regions
> >> (as well as others) aren't evenly distributed? I'd guess that I can
> >> major compact all tables to fix it, but I'd like to figure out why it
> >> hasn't happened automatically.
> >>
> >> HBase 0.90.0
> >> CDH3b2
> >>
> >> thanks,
> >> Bill
> >>
> >> On Mon, Mar 14, 2011 at 3:31 PM, Weiwei Xiong <xiongww@gmail.com>
> wrote:
> >> > I see.  Thanks Ryan.
> >> >
> >> > -- Weiwei
> >> >
> >> > On Mon, Mar 14, 2011 at 3:28 PM, Ryan Rawson <ryanobjc@gmail.com>
> wrote:
> >> >
> >> >> by default runs 1x/day. you can do it manually in the hbase shell by
> >> >> typing:
> >> >>
> >> >> hbase(main):001:0> major_compact "table_name"
> >> >>
> >> >> -ryan
> >> >>
> >> >>
> >> >> On Mon, Mar 14, 2011 at 3:25 PM, Weiwei Xiong <xiongww@gmail.com>
> >> wrote:
> >> >> > Thanks for your info Ryan.
> >> >> > Does HBase do major compaction regularly or do I need to manually
> do
> >> >> this?
> >> >> > If it's automatic, how frequently is it performed?
> >> >> > I am running 1 replication.
> >> >> > Thanks,
> >> >> > -- Weiwei
> >> >> >
> >> >> > On Mon, Mar 14, 2011 at 3:18 PM, Ryan Rawson <ryanobjc@gmail.com>
> >> wrote:
> >> >> >>
> >> >> >> HDFS does the data rebalancing, over time as major compactions
and
> >> new
> >> >> >> data comes in, files are written first to the local node then
to
> >> >> >> remote nodes.
> >> >> >>
> >> >> >> Whats the replication factor you are running?  HDFS on 2 nodes
is
> >> >> >> tricky, since you can either choose r=1 (no data protection)
or
> r=2
> >> >> >> (all writes go to both nodes).
> >> >> >>
> >> >> >> The sweet spot is above 6 nodes alas.
> >> >> >>
> >> >> >> -ryan
> >> >> >>
> >> >> >> On Mon, Mar 14, 2011 at 3:12 PM, Weiwei Xiong <xiongww@gmail.com>
> >> >> wrote:
> >> >> >> > Sorry I forgot to mention. I am using HBase 0.90.1 over
HDFS
> >> >> 0.20.append
> >> >> >> > Thanks,
> >> >> >> > -- Weiwei
> >> >> >> >
> >> >> >> > On Mon, Mar 14, 2011 at 3:10 PM, Weiwei Xiong <
> xiongww@gmail.com>
> >> >> wrote:
> >> >> >> >>
> >> >> >> >> Thanks very much for your replies.
> >> >> >> >> Something was unclear in my previous emails. I had
one node
> >> started
> >> >> >> >> first
> >> >> >> >> and another was added in later. And there're already
some
> regions
> >> >> >> >> created in
> >> >> >> >> the first started node. Then I started to import
more data into
> >> the
> >> >> >> >> same
> >> >> >> >> table and found that it's always the first node that
keeps
> serving
> >> >> the
> >> >> >> >> data
> >> >> >> >> writes.
> >> >> >> >> Actually I was expecting that the region data would
be
> re-balanced
> >> to
> >> >> >> >> another data node. And I did see in the master log
that HBase
> >> master
> >> >> is
> >> >> >> >> trying to unassigning some regions from the overloaded
node and
> >> >> >> >> re-assign
> >> >> >> >> them to the less-loaded node. But the real data was
never
> >> migrated.
> >> >> >> >> I think I observed the region index and cache rebalancing
from
> the
> >> >> >> >> master
> >> >> >> >> log (correct me if I were wrong).  Does anyone know
how
> frequently
> >> >> this
> >> >> >> >> happens?
> >> >> >> >> Another question is, does HBase support data and
I/O
> rebalancing?
> >> Or
> >> >> I
> >> >> >> >> should rely on HDFS to do data rebalancing? I guess
HBase
> should
> >> also
> >> >> >> >> support data rebalancing otherwise every time I restart
HBase
> the
> >> >> >> >> regions
> >> >> >> >> will have to be rebalanced again. Will someone tell
me how to
> >> >> configure
> >> >> >> >> or
> >> >> >> >> program HBase to do data rebalancing?
> >> >> >> >> Thanks,
> >> >> >> >> -- Weiwei
> >> >> >> >> On Mon, Mar 14, 2011 at 2:43 PM, Ryan Rawson <
> ryanobjc@gmail.com>
> >> >> >> >> wrote:
> >> >> >> >>>
> >> >> >> >>> What version of HBase are you testing?
> >> >> >> >>>
> >> >> >> >>> Is it literally 0 vs N assignments?
> >> >> >> >>>
> >> >> >> >>> On Mon, Mar 14, 2011 at 1:18 PM, Weiwei Xiong
<
> xiongww@gmail.com
> >> >
> >> >> >> >>> wrote:
> >> >> >> >>> > Thanks!
> >> >> >> >>> >
> >> >> >> >>> > I checked the master log and found some
info like this:
> >> >> >> >>> > " timestamp ***, INFO
> org.apache.hadoop.hbase.master.HMaster:
> >> >> >> >>> > balance
> >> >> >> >>> > hri=***, src=***, dst=*** "
> >> >> >> >>> >
> >> >> >> >>> > So I assume the balancer is running. There's
no failing info
> >> >> there,
> >> >> >> >>> > but
> >> >> >> >>> > I
> >> >> >> >>> > didn't see the regions were actually balanced
as the log
> >> states.
> >> >> >> >>> >
> >> >> >> >>> > Is it possible that I have been keeping
dumping data into
> the
> >> >> table
> >> >> >> >>> > thus the
> >> >> >> >>> > balancing won't work?
> >> >> >> >>> >
> >> >> >> >>> > Thanks,
> >> >> >> >>> > -- Weiwei
> >> >> >> >>> >
> >> >> >> >>> > On Mon, Mar 14, 2011 at 12:15 PM, Stack
<stack@duboce.net>
> >> wrote:
> >> >> >> >>> >
> >> >> >> >>> >> Check the master log.  See if the load
balancer is running
> or
> >> >> not.
> >> >> >> >>> >>  It
> >> >> >> >>> >> usually runs every 5 minutes by default.
 It may not run if
> >> >> regions
> >> >> >> >>> >> are transitioning.  It'll log regardless.
> >> >> >> >>> >>
> >> >> >> >>> >> St.Ack
> >> >> >> >>> >>
> >> >> >> >>> >> On Mon, Mar 14, 2011 at 10:50 AM, Weiwei
Xiong <
> >> >> xiongww@gmail.com>
> >> >> >> >>> >> wrote:
> >> >> >> >>> >> > Hi,
> >> >> >> >>> >> >
> >> >> >> >>> >> > I recently set up a 2-node Hadoop
and HBase cluster and
> am
> >> >> trying
> >> >> >> >>> >> > to
> >> >> >> >>> >> > load
> >> >> >> >>> >> > data into my HBase table using
HBase client.
> >> >> >> >>> >> >
> >> >> >> >>> >> > The issue bothers me is that the
data are always written
> >> into
> >> >> one
> >> >> >> >>> >> > node of
> >> >> >> >>> >> > the cluster, i.e., all the regions
of the hbase table are
> on
> >> >> one
> >> >> >> >>> >> > node.
> >> >> >> >>> >> >
> >> >> >> >>> >> > Is there any configuration I need
to change for make the
> >> load
> >> >> >> >>> >> > balanced?
> >> >> >> >>> >> >
> >> >> >> >>> >> > Thanks,
> >> >> >> >>> >> > -- w
> >> >> >> >>> >> >
> >> >> >> >>> >>
> >> >> >> >>> >
> >> >> >> >>
> >> >> >> >
> >> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >
> >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message