Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 10C79E3E7 for ; Sat, 2 Mar 2013 22:29:47 +0000 (UTC) Received: (qmail 74338 invoked by uid 500); 2 Mar 2013 22:29:45 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 74247 invoked by uid 500); 2 Mar 2013 22:29:44 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 74238 invoked by uid 99); 2 Mar 2013 22:29:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 02 Mar 2013 22:29:44 +0000 X-ASF-Spam-Status: No, hits=-2.3 required=5.0 tests=RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of dan@magnetic.com designates 64.18.0.246 as permitted sender) Received: from [64.18.0.246] (HELO exprod5og115.obsmtp.com) (64.18.0.246) by apache.org (qpsmtpd/0.29) with SMTP; Sat, 02 Mar 2013 22:29:35 +0000 Received: from MagMail.magnetic.local ([209.123.144.75]) (using TLSv1) by exprod5ob115.postini.com ([64.18.4.12]) with SMTP ID DSNKUTJ9OycY+Ogk/ZD7L+zxDktteOa5j2ri@postini.com; Sat, 02 Mar 2013 14:29:15 PST Received: from MagMail.magnetic.local ([fe80::3856:4429:3f6f:6aec]) by MagMail.magnetic.local ([fe80::3856:4429:3f6f:6aec%12]) with mapi id 14.02.0298.004; Sat, 2 Mar 2013 17:29:14 -0500 From: Dan Crosta To: "" Subject: Re: HBase Thrift inserts bottlenecked somewhere -- but where? Thread-Topic: HBase Thrift inserts bottlenecked somewhere -- but where? Thread-Index: AQHOFna6ZoNKXDMXAkq3rNQcmIokqJiSFvKAgADiSgCAAAc/AIAAEzYAgAATUQCAACqdgA== Date: Sat, 2 Mar 2013 22:29:13 +0000 Message-ID: References: <1362195777.67783.YahooMailNeo@web140603.mail.bf1.yahoo.com> <1362245928.60994.YahooMailNeo@web140603.mail.bf1.yahoo.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [71.183.248.155] Content-Type: text/plain; charset="us-ascii" Content-ID: <8734784DBB461040B99D3AA9093B1923@magnetic.local> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Hm. This could be part of the problem in our case. Unfortunately we don't h= ave very good control over which rowkeys will come from which workers (we'r= e not using map-reduce or anything like it where we have that sort of contr= ol, at least not without some changes). But this is valuable information fo= r future developments, thanks for mentioning it. On Mar 2, 2013, at 2:56 PM, Asaf Mesika wrote: > Make sure you are not sending a lot of put of the same rowkey. This can > cause contention in the region server side. We fixed that in our project = by > aggregating all the columns for the same rowkey into the same Put object > thus when sending List of Put we made sure each Put has a unique rowkey. >=20 > On Saturday, March 2, 2013, Dan Crosta wrote: >=20 >> On Mar 2, 2013, at 12:38 PM, lars hofhansl wrote: >>> "That's only true from the HDFS perspective, right? Any given region is >>> "owned" by 1 of the 6 regionservers at any given time, and writes are >>> buffered to memory before being persisted to HDFS, right?" >>>=20 >>> Only if you disabled the WAL, otherwise each change is written to the >> WAL first, and then committed to the memstore. >>> So in the sense it's even worse. Each edit is written twice to the FS, >> replicated 3 times, and all that only 6 data nodes. >>=20 >> Are these writes synchronized somehow? Could there be a locking problem >> somewhere that wouldn't show up as utilization of disk or cpu? >>=20 >> What is the upshot of disabling WAL -- I assume it means that if a >> RegionServer crashes, you lose any writes that it has in memory but not >> committed to HFiles? >>=20 >>=20 >>> 20k writes does seem a bit low. >>=20 >> I adjusted dfs.datanode.handler.count from 3 to 10 and now we're up to >> about 22-23k writes per second, but still no apparent contention for any= of >> the basic system resources. >>=20 >> Any other suggestions on things to try? >>=20 >> Thanks, >> - Dan