Return-Path: Delivered-To: apmail-hadoop-hbase-user-archive@minotaur.apache.org Received: (qmail 92238 invoked from network); 6 Dec 2009 07:08:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 6 Dec 2009 07:08:24 -0000 Received: (qmail 49494 invoked by uid 500); 6 Dec 2009 07:08:23 -0000 Delivered-To: apmail-hadoop-hbase-user-archive@hadoop.apache.org Received: (qmail 49429 invoked by uid 500); 6 Dec 2009 07:08:22 -0000 Mailing-List: contact hbase-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hbase-user@hadoop.apache.org Delivered-To: mailing list hbase-user@hadoop.apache.org Received: (qmail 49419 invoked by uid 99); 6 Dec 2009 07:08:22 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Dec 2009 07:08:22 +0000 X-ASF-Spam-Status: No, hits=-1.2 required=5.0 tests=AWL,BAYES_00,MIME_QP_LONG_LINE X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [69.147.107.21] (HELO mrout2-b.corp.re1.yahoo.com) (69.147.107.21) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Dec 2009 07:08:18 +0000 Received: from SNV-EXPF01.ds.corp.yahoo.com (snv-expf01.ds.corp.yahoo.com [207.126.227.250]) by mrout2-b.corp.re1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id nB677bq2053050 for ; Sat, 5 Dec 2009 23:07:37 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:user-agent:date:subject:from:to:message-id: thread-topic:thread-index:in-reply-to:mime-version:content-type: content-transfer-encoding:x-originalarrivaltime; b=Jue/+sBWdwHdjqvG5uICx+sy5uNzrw9fdYDwYd8yAzsrjDr+dPSKI8nqa99zWO3z Received: from SNV-EXVS05.ds.corp.yahoo.com ([207.126.227.225]) by SNV-EXPF01.ds.corp.yahoo.com with Microsoft SMTPSVC(6.0.3790.3959); Sat, 5 Dec 2009 23:07:36 -0800 Received: from 10.72.76.188 ([10.72.76.188]) by SNV-EXVS05.ds.corp.yahoo.com ([207.126.227.45]) via Exchange Front-End Server snv-webmail.corp.yahoo.com ([207.126.227.59]) with Microsoft Exchange Server HTTP-DAV ; Sun, 6 Dec 2009 07:07:36 +0000 User-Agent: Microsoft-Entourage/12.23.0.091001 Date: Sat, 05 Dec 2009 23:07:35 -0800 Subject: Re: Problems with read ops when table size is large From: Adam Silberstein To: Message-ID: Thread-Topic: Problems with read ops when table size is large Thread-Index: Acp2QslCwQlMUwvx7kqRVXjCI7UuMQ== In-Reply-To: <7c962aed0912052138m674a1ed3k14cc02c8cf0469f6@mail.gmail.com> Mime-version: 1.0 Content-type: text/plain; charset="ISO-8859-1" Content-transfer-encoding: quoted-printable X-OriginalArrivalTime: 06 Dec 2009 07:07:36.0746 (UTC) FILETIME=[CA4D40A0:01CA7642] Thanks for the suggestions. Let me run down what I tried: 1. My ulimit was already much higher than 1024, so no change there. 2. I was not using hdfs-127. I switched to that. I didn't use M/R to do m= y initial load, by the way. 3. I was a little unclear on which handler counts to increase and to what. I changed hbase.regionserver.handler.count, dfs.namenode.handler.count, and dfs.datanode.handler.count all from 10 to 100. 4. I did see the error that I was exceeding the dfs.datanode.max.xcievers value 256. What's odd is that I have that set to ~3000, but it's apparentl= y not getting picked up by hdfs when it starts. Any ideas there (like is it really xceivers)? 5. I'm not sure how many regions per regionserver. What's a good way to check that. 6. Didn't get to checking for missing block. Ultimately, either #2 or #3 or both helped. I was able to push throughput way up without seeing the error recur. So thanks a lot for the help! I'm still interested in getting the best performance possible. So if you think fixing the xciever problem will help, I'd like to spend some more time there. =20 Thanks, Adam On 12/5/09 9:38 PM, "stack" wrote: > See http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6. Different hd= fs > complaint but make sure your ulimit is > 1024 (check first or second line= in > master log -- it prints out what hbase is seeing for ulimit), check that > hdfs-127 is applied to the first hadoop that hbase sees on CLASSPATH (thi= s > is particularly important if your loading script is a mapreduce task, > clients might not be seeing the patched hadoop that hbase ships with). A= lso > up the handler count for hdfs (the referred to timeout is no longer > pertinent I believe) and while you are at it, those for hbase if you have= n't > changed them from defaults. While you are at it, make sure you don't suf= fer > from http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5. >=20 > How many regions per regionserver? >=20 > Can you put a regionserver log somewhere I can pull it to take a look? >=20 > For a "Could not obtain block message", what happens if you take the > filename -- 2540865741541403627 in the below -- and grep NameNode. Does = it > tell you anything? >=20 > St.Ack >=20 > On Sat, Dec 5, 2009 at 3:32 PM, Adam Silberstein > wrote: >=20 >> Hi, >> I=B9m having problems doing client operations when my table is large. I d= id >> an initial test like this: >> 6 servers >> 6 GB heap size per server >> 20 million 1K recs (so ~3 GB per server) >>=20 >> I was able to do at least 5,000 random read/write operations per second. >>=20 >> I think increased my table size to >> 120 million 1K recs (so ~20 GB per server) >>=20 >> I then put a very light load of random reads on the table: 20 reads per >> second. I=B9m able to do a few, but within 10-20 seconds, they all fail. = I >> found many errors of the following type in the hbase master log: >>=20 >> java.io.IOException: java.io.IOException: Could not obtain block: >> blk_-7409743019137510182_39869 >> file=3D/hbase/.META./1028785192/info/2540865741541403627 >>=20 >> If I wait about 5 minutes, I can repeat this sequence (do a few operatio= ns, >> then get errors). >>=20 >> If anyone has any suggestions or needs me to list particular settings, l= et >> me know. The odd thing is that I observe no problems and great performa= nce >> with a smaller table. >>=20 >> Thanks, >> Adam >>=20 >>=20 >>=20