Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 47096 invoked from network); 17 Feb 2010 08:19:01 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 17 Feb 2010 08:19:01 -0000 Received: (qmail 40570 invoked by uid 500); 17 Feb 2010 08:19:00 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 40525 invoked by uid 500); 17 Feb 2010 08:18:59 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 40515 invoked by uid 99); 17 Feb 2010 08:18:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Feb 2010 08:18:59 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [209.85.216.185] (HELO mail-px0-f185.google.com) (209.85.216.185) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 17 Feb 2010 08:18:49 +0000 Received: by pxi15 with SMTP id 15so1349554pxi.21 for ; Wed, 17 Feb 2010 00:18:27 -0800 (PST) From: Daniel Washusen In-Reply-To: <1266386398.4688.312.camel@puma> Mime-Version: 1.0 (iPhone Mail 7D11) References: <1266277557.15001.241.camel@puma> <1266304866.15001.348.camel@puma> <7c962aed1002161017p4b64349ekec554f8505763ee0@mail.gmail.com> <1266346222.15001.455.camel@puma> <7c962aed1002161205n4cbb6833n478c8121a1f6d396@mail.gmail.com> <1266359131.4688.213.camel@puma> <7c962aed1002161445n458a3783n6ccbd5730a37df88@mail.gmail.com> <1266363119.4688.259.camel@puma> <7c457ebe1002161545o2dcb68ablc1214e07065b66c9@mail.gmail.com> <1266364786.4688.269.camel@puma> <7c962aed1002162118g5a2fdcf0sb90a4107291683e2@mail.gmail.com> <1266386398.4688.312.camel@puma> Date: Wed, 17 Feb 2010 19:18:26 +1100 Received: by 10.114.189.18 with SMTP id m18mr5064379waf.216.1266394707391; Wed, 17 Feb 2010 00:18:27 -0800 (PST) Message-ID: <7780666694872561304@unknownmsgid> Subject: Re: Optimizations for random read performance To: "hbase-user@hadoop.apache.org" Content-Type: text/plain; charset=ISO-8859-1 X-Virus-Checked: Checked by ClamAV on apache.org Glad you sorted it out! Please do tell... On 17/02/2010, at 4:59 PM, James Baldassari wrote: > Hi, > > I think we managed to solve our performance and load issues. > Everything > has been stable for about an hour now, but I'm not going to raise the > victory flag until the morning because we've had short periods of > stability in the past. > > I've been working on this problem non-stop for almost a week now, so I > really need to get some sleep, but if everything looks good tomorrow > I'll write up a summary of all the changes we made and share it with > the > group. Hopefully this exercise in tuning for a high-throughput > real-time environment will be useful to others. > > Thanks, > James > > > On Tue, 2010-02-16 at 23:18 -0600, Stack wrote: >> When you look at top on the loaded server is it the regionserver or >> the datanode that is using up the cpu? >> >> I look at your hdfs listing. Some of the regions have 3 and 4 files >> but most look fine. A good few are on the compaction verge so I'd >> imagine a lot of compaction going on but this is background though it >> does suck cpu and i/o... it shouldn't be too bad. >> >> I took a look at the regionserver log. The server is struggling >> during which time period? There is one log run at the start and >> there >> it seems like nothing untoward. Please enable DEBUG going forward. >> It'll shed more light on whats going on: See >> http://wiki.apache.org/hadoop/Hbase/FAQ#A5 for how. Otherwise, the >> log doesn't have anything running long enough for it to have been >> under serious load. >> >> This is a four node cluster now? You don't seem to have too many >> regions per server yet you have a pretty high read/write rate going >> by >> earlier requests postings. Maybe you need to add more servers. Are >> you going to add in those 16G machines? >> >> When you look at the master ui, you can see that the request rate >> over >> time is about the same for all regionservers? (refresh the master ui >> every so often to take a new sampling). >> >> St.Ack >> >> >> >> >> On Tue, Feb 16, 2010 at 3:59 PM, James Baldassari >> wrote: >>> Nope. We don't do any map reduce. We're only using Hadoop for >>> HBase at >>> the moment. >>> >>> That one node, hdfs02, still has a load of 16 with around 40% I/O >>> and >>> 120% CPU. The other nodes are all around 66% CPU with 0-1% I/O >>> and load >>> of 1 to 3. >>> >>> I don't think all the requests are going to hdfs02 based on the >>> status >>> 'detailed' output. It seems like that node is just having a much >>> harder >>> time getting the data or something. Maybe we have some incorrect >>> HDFS >>> setting. All the configs are identical, though. >>> >>> -James >>> >>> >>> On Tue, 2010-02-16 at 17:45 -0600, Dan Washusen wrote: >>>> You mentioned in a previous email that you have a Task Tracker >>>> process >>>> running on each of the nodes. Is there any chance there is a map >>>> reduce job >>>> running? >>>> >>>> On 17 February 2010 10:31, James Baldassari >>>> wrote: >>>> >>>>> On Tue, 2010-02-16 at 16:45 -0600, Stack wrote: >>>>>> On Tue, Feb 16, 2010 at 2:25 PM, James Baldassari >>>>> > >>>>> wrote: >>>>>>> On Tue, 2010-02-16 at 14:05 -0600, Stack wrote: >>>>>>>> On Tue, Feb 16, 2010 at 10:50 AM, James Baldassari >>>>>>> > >>>>> wrote: >>>>>>> >>>>>>> Whether the keys themselves are evenly distributed is another >>>>>>> matter. >>>>>>> Our keys are user IDs, and they should be fairly random. If >>>>>>> we do a >>>>>>> status 'detailed' in the hbase shell we see the following >>>>>>> distribution >>>>>>> for the value of "requests" (not entirely sure what this value >>>>>>> means): >>>>>>> hdfs01: 7078 >>>>>>> hdfs02: 5898 >>>>>>> hdfs03: 5870 >>>>>>> hdfs04: 3807 >>>>>>> >>>>>> That looks like they are evenly distributed. Requests are how >>>>>> many >>>>>> hits a second. See the UI on master port 60010. The numbers >>>>>> should >>>>>> match. >>>>> >>>>> So the total across all 4 region servers would be 22,653/ >>>>> second? Hmm, >>>>> that doesn't seem too bad. I guess we just need a little more >>>>> throughput... >>>>> >>>>>> >>>>>> >>>>>>> There are no order of magnitude differences here, and the >>>>>>> request count >>>>>>> doesn't seem to map to the load on the server. Right now >>>>>>> hdfs02 has a >>>>>>> load of 16 while the 3 others have loads between 1 and 2. >>>>>> >>>>>> >>>>>> This is interesting. I went back over your dumps of cache >>>>>> stats above >>>>>> and the 'loaded' servers didn't have any attribute there that >>>>>> differentiated it from others. For example, the number of >>>>>> storefiles >>>>>> seemed about same. >>>>>> >>>>>> I wonder what is making for the high load? Can you figure it? >>>>>> Is it >>>>>> high CPU use (unlikely). Is it then high i/o? Can you try and >>>>>> figure >>>>>> whats different about the layout under the loaded server and >>>>>> that of >>>>>> an unloaded server? Maybe do a ./bin/hadoop fs -lsr /hbase and >>>>>> see if >>>>>> anything jumps out at you. >>>>> >>>>> It's I/O wait that is killing the highly loaded server. The CPU >>>>> usage >>>>> reported by top is just about the same across all servers >>>>> (around 100% >>>>> on an 8-core node), but one server at any given time has a much >>>>> higher >>>>> load due to I/O. >>>>> >>>>>> >>>>>> If you want to post the above or a loaded servers log to >>>>>> pastbin we'll >>>>>> take a looksee. >>>>> >>>>> I'm not really sure what to look for, but maybe someone else >>>>> will notice >>>>> something, so here's the output of hadoop fs -lsr /hbase: >>>>> http://pastebin.com/m98096de >>>>> >>>>> And here is today's region server log from hdfs02, which seems >>>>> to get >>>>> hit particularly hard: http://pastebin.com/m1d8a1e5f >>>>> >>>>> Please note that we restarted it several times today, so some of >>>>> those >>>>> errors are probably just due to restarting the region server. >>>>> >>>>>> >>>>>> >>>>>> Applying >>>>>>> HBASE-2180 did not make any measurable difference. There are >>>>>>> no errors >>>>>>> in the region server logs. However, looking at the Hadoop >>>>>>> datanode >>>>>>> logs, I'm seeing lots of these: >>>>>>> >>>>>>> 2010-02-16 17:07:54,064 ERROR >>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: >>>>> DatanodeRegistration( >>>>> 10.24.183.165:50010, >>>>> storageID=DS-1519453437-10.24.183.165-50010-1265907617548, >>>>> infoPort=50075, >>>>> ipcPort=50020):DataXceiver >>>>>>> java.io.EOFException >>>>>>> at java.io.DataInputStream.readShort >>>>>>> (DataInputStream.java:298) >>>>>>> at >>>>> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run >>>>> (DataXceiver.java:79) >>>>>>> at java.lang.Thread.run(Thread.java:619) >>>>>> >>>>>> You upped xceivers on your hdfs cluster? If you look at >>>>>> otherend of >>>>>> the above EOFE, can you see why it died? >>>>> >>>>> Max xceivers = 3072; datanode handler count = 20; region server >>>>> handler >>>>> count = 100 >>>>> >>>>> I can't find the other end of the EOFException. I looked in the >>>>> Hadoop >>>>> and HBase logs on the server that is the name node and HBase >>>>> master, as >>>>> well as the on HBase client. >>>>> >>>>> Thanks for all the help! >>>>> >>>>> -James >>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> However, I do think it's strange that >>>>>>> the load is so unbalanced on the region servers. >>>>>>> >>>>>> >>>>>> I agree. >>>>>> >>>>>> >>>>>>> We're also going to try throwing some more hardware at the >>>>>>> problem. >>>>>>> We'll set up a new cluster with 16-core, 16G nodes to see if >>>>>>> they are >>>>>>> better able to handle the large number of client requests. We >>>>>>> might >>>>>>> also decrease the block size to 32k or lower. >>>>>>> >>>>>> Ok. >>>>>> >>>>>>>> Should only be a matter if you intend distributing the above. >>>>>>> >>>>>>> This is probably a topic for a separate thread, but I've never >>>>>>> seen a >>>>>>> legal definition for the word "distribution." How does this >>>>>>> apply to >>>>>>> the SaaS model? >>>>>>> >>>>>> Fair enough. >>>>>> >>>>>> Something is up. Especially if hbase-2180 made no difference. >>>>>> >>>>>> St.Ack >>>>> >>>>> >>> >>> >