Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 56110 invoked from network); 27 May 2009 22:52:40 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 27 May 2009 22:52:40 -0000 Received: (qmail 7977 invoked by uid 500); 27 May 2009 21:12:55 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 7963 invoked by uid 500); 27 May 2009 21:12:55 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 7953 invoked by uid 99); 27 May 2009 21:12:55 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 May 2009 21:12:55 +0000 X-ASF-Spam-Status: No, hits=3.4 required=10.0 tests=HTML_MESSAGE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [76.13.9.46] (HELO web65502.mail.ac4.yahoo.com) (76.13.9.46) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 27 May 2009 21:12:43 +0000 Received: (qmail 47245 invoked by uid 60001); 27 May 2009 21:12:21 -0000 Message-ID: <281041.43519.qm@web65502.mail.ac4.yahoo.com> X-YMail-OSG: e1zerYQVM1kfDVVBs0PodZHikspzrn96n2wl6CtfEb5GFmmhs4b9OBS1BhBvczIf5YOdl7ikXdorAafbDHZ86Zroq.eA5_IKdHe5VZuHQPoEw9FHc0FeB3GUyCcL5JuH1h4Gs2jShtpdl_P3LMStD2S.n13P.VNc0ZTX2WbRVeZYN5EoPATcT7DbgqXFNfe96S186ug9u4gRKG8SkdmdsHn1W.DqraftoQ6NUDtiGaNWlUgYOK8_Zl3YiLpxRkN9Gikfo8ZRVSA2hOD7NZ0CEZBeNauX4YT1vu6otXSg4ZDnPeOyP05YsklvAlAN20GCZmaK.ufa0h7o5F9g_vzZ1.3ZyV9z5en_RNLOw_YOpYP8ew-- Received: from [69.106.5.119] by web65502.mail.ac4.yahoo.com via HTTP; Wed, 27 May 2009 14:12:21 PDT X-RocketYMMF: apurtell X-Mailer: YahooMailRC/1277.43 YahooMailWebService/0.7.289.10 References: <4A159A4C.3040204@yahoo.com> <7c962aed0905211144v1ea535frec71c788eb002d9a@mail.gmail.com> <4A16B4F0.3030709@yahoo.com> <7c962aed0905220913q3a2b41ued40221e01a6074e@mail.gmail.com> <23688361.post@talk.nabble.com> <7c962aed0905231515i1c1ca163vc1f4e3eef9362aba@mail.gmail.com> <23696025.post@talk.nabble.com> <7c962aed0905242149s75b6b3acv70ad9afd994b4de7@mail.gmail.com> <23725775.post@talk.nabble.com> <7c962aed0905261018r8234e74v39a5721cd862c0fb@mail.gmail.com> <23727877.post@talk.nabble.com> <23727987.post@talk.nabble.com> <193657.16412.qm@web65515.mail.ac4.yahoo.com> <23747484.post@talk.nabble.com> Date: Wed, 27 May 2009 14:12:21 -0700 (PDT) From: Andrew Purtell Subject: Re: HBase looses regions. To: hbase-user@hadoop.apache.org In-Reply-To: <23747484.post@talk.nabble.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="0-1592930143-1243458741=:43519" X-Virus-Checked: Checked by ClamAV on apache.org --0-1592930143-1243458741=:43519 Content-Type: text/plain; charset=us-ascii Thanks for the system info. CPU and RAM resources look good. > > > > Can you consider adding additional nodes to spread the load on DFS? > > > Yes. If that will help. Right now I'm not seeing any splits happening, so > I don't know how much adding more boxes will help. It seems to not be > balanced. All writes go to a single slave, when that box dies, it moves to > the next. You are not able to insert enough data into HBase to trigger splits (or they are delayed due to intense write activity), because the HDFS layer underneath is not able to keep up. If you have already tried the HDFS related suggestions on the HBase troubleshooting page (http://wiki.apache.org/hadoop/Hbase/Troubleshooting) and are still having problems, adding additional HDFS data nodes to spread the load may help, but it depends where is the bottleneck. What network do you have connecting these boxes? Gigabit Ethernet? Fast (megabit) Ethernet? Something else to try here is forcing splits early in the upload process. You can use the hbase shell or the HBaseAdmin client API to signal the master to ask the regionserver(s) hosting a table to split it regardless of whether the stores have reached the split threshold or not. This type of split request is advisory, and will not happen if there is no data in a region or if it has just split already and a compaction is pending. The strategy here is to actually use all of the nodes you have deployed, or, if you add more, to use those as well. Force enough splits so there is at least one region of the table hosted by a region server on every node. It could be something like: HTable table = ...; HBaseAdmin admin = ...; int nodes = getAvailableNodes(); while (table.getStartKeys().length < nodes) { // upload 100 rows // ... admin.split(table.getTableName()); // give some time for things to settle down Thread.sleep(30 * 1000); // or 60 * 1000 } // upload the rest // ... There is no HBase API to stand in for getAvailableNodes() -- yet... I will make one now, that seems useful -- but if you have co-deployed mapreduce with your region servers, you could use JobClient.getClusterStatus() to programmatically determine the size of the cluster. See http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/JobClient.html#getClusterStatus() Best regards, - Andy ________________________________ From: llpind To: hbase-user@hadoop.apache.org Sent: Wednesday, May 27, 2009 10:49:45 AM Subject: Re: HBase looses regions. Andrew Purtell-2 wrote: > > Also the program that is pounding the cluster with inserts? What is the > hardware spec of those nodes? How many CPUs? How many cores? How much RAM? > I'm currently running the client loader program from my local box. 2 Duo CPU P8400 @ 2.26GHz, 3.48GB of RAM. I've tried a Map/Reduce job as well, but it also does the same thing. I need help running a map/reduce job in a dist. manner. The way I run it now is iterating over the ResultSet and doing batch updates while Row key is the same. Master box: ===================================================================== Tasks: 162 total, 1 running Load average: 0.00 0.00 0.00 Uptime: 3 days, 19:31:54 Mem[|||||||||||||||||||| 459/3584MB] Swp[ 0/2047MB] Quad core: Intel(R) Xeon(TM) CPU 3.00GHz Slave box1, box2, and box3 are all the same as above but with more HD (~200GB). Andrew Purtell-2 wrote: > > The regionservers are running on the same nodes as the DFS datanodes I > presume > Yes. that is correct. The slaves have: 3809 DataNode 3938 HRegionServer 3601 Jps The master has: 1293 NameNode 7363 Jps 1464 SecondaryNameNode 1568 HMaster Andrew Purtell-2 wrote: > > Can you consider adding additional nodes to spread the load on DFS? > Yes. If that will help. Right now I'm not seeing any splits happening, so I don't know how much adding more boxes will help. It seems to not be balanced. All writes go to a single slave, when that box dies, it moves to the next. -- View this message in context: http://www.nabble.com/HBase-looses-regions.-tp23657983p23747484.html Sent from the HBase User mailing list archive at Nabble.com. --0-1592930143-1243458741=:43519--