Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of dan@magnetic.com designates
 64.18.0.246 as permitted sender)
From: Dan Crosta <dan@magnetic.com>
To: "<user@hbase.apache.org>" <user@hbase.apache.org>
Subject: Re: HBase Thrift inserts bottlenecked somewhere -- but where?
Thread-Topic: HBase Thrift inserts bottlenecked somewhere -- but where?
Thread-Index: 
 AQHOFna6ZoNKXDMXAkq3rNQcmIokqJiSFvKAgADiSgCAAAc/AIAAEzYAgAATUQCAACqdgA==
Date: Sat, 2 Mar 2013 22:29:13 +0000
Message-ID: <D6CCF633-D197-486C-95F9-58D2C54761CF@magnetic.com>
References: <E86C5DD2-6436-4D23-8479-3DD0F9B00E81@magnetic.com>
 <1362195777.67783.YahooMailNeo@web140603.mail.bf1.yahoo.com>
 <DF3A2D23-4F15-460D-9CE3-831E3AD1F7AF@magnetic.com>
 <1362245928.60994.YahooMailNeo@web140603.mail.bf1.yahoo.com>
 <F3F1DB40-AB84-44C7-AA4A-DC2BAC8A5582@magnetic.com>
 <CA+r7YvkET0OEMshTaKKMvfRmqHGppdBuHXhWB=ErmX+TpXCkbg@mail.gmail.com>
In-Reply-To: 
 <CA+r7YvkET0OEMshTaKKMvfRmqHGppdBuHXhWB=ErmX+TpXCkbg@mail.gmail.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: text/plain; charset="us-ascii"
Content-ID: <8734784DBB461040B99D3AA9093B1923@magnetic.local>
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0

Hm. This could be part of the problem in our case. Unfortunately we don't h=
ave very good control over which rowkeys will come from which workers (we'r=
e not using map-reduce or anything like it where we have that sort of contr=
ol, at least not without some changes). But this is valuable information fo=
r future developments, thanks for mentioning it.

On Mar 2, 2013, at 2:56 PM, Asaf Mesika wrote:

> Make sure you are not sending a lot of put of the same rowkey. This can
> cause contention in the region server side. We fixed that in our project =
by
> aggregating all the columns for the same rowkey into the same Put object
> thus when sending List of Put we made sure each Put has a unique rowkey.
>=20
> On Saturday, March 2, 2013, Dan Crosta wrote:
>=20
>> On Mar 2, 2013, at 12:38 PM, lars hofhansl wrote:
>>> "That's only true from the HDFS perspective, right? Any given region is
>>> "owned" by 1 of the 6 regionservers at any given time, and writes are
>>> buffered to memory before being persisted to HDFS, right?"
>>>=20
>>> Only if you disabled the WAL, otherwise each change is written to the
>> WAL first, and then committed to the memstore.
>>> So in the sense it's even worse. Each edit is written twice to the FS,
>> replicated 3 times, and all that only 6 data nodes.
>>=20
>> Are these writes synchronized somehow? Could there be a locking problem
>> somewhere that wouldn't show up as utilization of disk or cpu?
>>=20
>> What is the upshot of disabling WAL -- I assume it means that if a
>> RegionServer crashes, you lose any writes that it has in memory but not
>> committed to HFiles?
>>=20
>>=20
>>> 20k writes does seem a bit low.
>>=20
>> I adjusted dfs.datanode.handler.count from 3 to 10 and now we're up to
>> about 22-23k writes per second, but still no apparent contention for any=
 of
>> the basic system resources.
>>=20
>> Any other suggestions on things to try?
>>=20
>> Thanks,
>> - Dan