Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hbase-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
Message-ID: <281041.43519.qm@web65502.mail.ac4.yahoo.com>
References: <4A159A4C.3040204@yahoo.com>
 <7c962aed0905211144v1ea535frec71c788eb002d9a@mail.gmail.com>
 <4A16B4F0.3030709@yahoo.com>
 <7c962aed0905220913q3a2b41ued40221e01a6074e@mail.gmail.com>
 <23688361.post@talk.nabble.com>
 <7c962aed0905231515i1c1ca163vc1f4e3eef9362aba@mail.gmail.com>
 <23696025.post@talk.nabble.com>
 <7c962aed0905242149s75b6b3acv70ad9afd994b4de7@mail.gmail.com>
 <23725775.post@talk.nabble.com>
 <7c962aed0905261018r8234e74v39a5721cd862c0fb@mail.gmail.com>
 <23727877.post@talk.nabble.com> <23727987.post@talk.nabble.com>
 <193657.16412.qm@web65515.mail.ac4.yahoo.com> <23747484.post@talk.nabble.com>
Date: Wed, 27 May 2009 14:12:21 -0700 (PDT)
From: Andrew Purtell <apurtell@apache.org>
Subject: Re: HBase looses regions.
To: hbase-user@hadoop.apache.org
In-Reply-To: <23747484.post@talk.nabble.com>
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary="0-1592930143-1243458741=:43519"

--0-1592930143-1243458741=:43519
Content-Type: text/plain; charset=us-ascii

Thanks for the system info. CPU and RAM resources look good. 

> > 
> > Can you consider adding additional nodes to spread the load on DFS? 
> > 
> Yes.  If that will help.  Right now I'm not seeing any splits happening, so
> I don't know how much adding more boxes will help.  It seems to not be
> balanced.  All writes go to a single slave, when that box dies, it moves to
> the next.

You are not able to insert enough data into HBase to trigger splits (or they are delayed due to intense write activity), because the HDFS layer underneath is not able to keep up. If you have already tried the HDFS related suggestions on the HBase troubleshooting page (http://wiki.apache.org/hadoop/Hbase/Troubleshooting) and are still having problems, adding additional HDFS data nodes to spread the load may help, but it depends where is the bottleneck. What network do you have connecting these boxes? Gigabit Ethernet? Fast (megabit) Ethernet? 

Something else to try here is forcing splits early in the upload process. You can use the hbase shell or the HBaseAdmin client API to signal the master to ask the regionserver(s) hosting a table to split it regardless of whether the stores have reached the split threshold or not. This type of split request is advisory, and will not happen if there is no data in a region or if it has just split already and a compaction is pending. The strategy here is to actually use all of the nodes you have deployed, or, if you add more, to use those as well. Force enough splits so there is at least one region of the table hosted by a region server on every node. It could be something like:

    HTable table = ...;
    HBaseAdmin admin = ...;
    int nodes = getAvailableNodes();
    while (table.getStartKeys().length < nodes) {
        // upload 100 rows
        // ...
        admin.split(table.getTableName());
        // give some time for things to settle down
        Thread.sleep(30 * 1000); // or 60 * 1000
    }
    // upload the rest
    // ...

There is no HBase API to stand in for getAvailableNodes() -- yet... I will make one now, that seems useful -- but if you have co-deployed mapreduce with your region servers, you could use JobClient.getClusterStatus() to programmatically determine the size of the cluster. See http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobClient.html#getClusterStatus()

Best regards,

   - Andy


________________________________
From: llpind <sonny_heer@hotmail.com>
To: hbase-user@hadoop.apache.org
Sent: Wednesday, May 27, 2009 10:49:45 AM
Subject: Re: HBase looses regions.


Andrew Purtell-2 wrote:
> 
> Also the program that is pounding the cluster with inserts? What is the
> hardware spec of those nodes? How many CPUs? How many cores? How much RAM? 
> 
I'm currently running the client loader program from my local box.
       2 Duo CPU P8400 @ 2.26GHz, 3.48GB of RAM.

I've tried a Map/Reduce job as well, but it also does the same thing.  I
need help running a map/reduce job in a dist. manner.  The way I run it now
is iterating over the ResultSet and doing batch updates while Row key is the
same. 

Master box: 
=====================================================================

  Tasks: 162 total, 1 running
  Load average: 0.00 0.00 0.00
  Uptime: 3 days, 19:31:54
  Mem[|||||||||||||||||||| 459/3584MB]
  Swp[ 0/2047MB]

Quad core: Intel(R) Xeon(TM) CPU 3.00GHz

Slave box1, box2, and box3 are all the same as above but with more HD
(~200GB). 


Andrew Purtell-2 wrote:
> 
> The regionservers are running on the same nodes as the DFS datanodes I
> presume
> 

Yes.  that is correct.  The slaves have:
3809 DataNode
3938 HRegionServer
3601 Jps

The master has:
1293 NameNode
7363 Jps
1464 SecondaryNameNode
1568 HMaster


Andrew Purtell-2 wrote:
> 
> Can you consider adding additional nodes to spread the load on DFS? 
> 
Yes.  If that will help.  Right now I'm not seeing any splits happening, so
I don't know how much adding more boxes will help.  It seems to not be
balanced.  All writes go to a single slave, when that box dies, it moves to
the next.
-- 
View this message in context: http://www.nabble.com/HBase-looses-regions.-tp23657983p23747484.html
Sent from the HBase User mailing list archive at Nabble.com.


--0-1592930143-1243458741=:43519--