Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 611791072B for ; Thu, 14 Nov 2013 12:47:37 +0000 (UTC) Received: (qmail 40329 invoked by uid 500); 14 Nov 2013 12:47:34 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 40113 invoked by uid 500); 14 Nov 2013 12:47:34 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 40082 invoked by uid 99); 14 Nov 2013 12:47:33 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Nov 2013 12:47:33 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of asaf.mesika@gmail.com designates 209.85.214.174 as permitted sender) Received: from [209.85.214.174] (HELO mail-ob0-f174.google.com) (209.85.214.174) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Nov 2013 12:47:29 +0000 Received: by mail-ob0-f174.google.com with SMTP id uy5so2076939obc.5 for ; Thu, 14 Nov 2013 04:47:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=4h6X47LP2iet9/NtV9qM28z9/KpVUQwEa345PJorkJ0=; b=BluPadd6of/5WhCBFD+aZQJdgXlGBauDTp7Kj99yukznDkqJOJkuKk16onkcP+lOB/ Q4eV9M1APj76mSCZTURQGScW5gQn8M7yC4oZ/sLIyjB2xTZPCh8PXpzPIEQj7CtQ57Yl MtZ8xwLpRnudEh/xKKwcvKABl1hGunqKB58uR4AwoSLWVGEDqYVuTiqK9q4se/viJlt5 mLL2sS5h2PSwnknwtROzu2Qso4yDjO9Ep51SUbetCgr6yNAQRPUUHKTqCEy+u9N22NiC ygGj9aZnGrqJlGoDkCH4hC33I0rzQ+RCb69gD/Tj91tRfN+gO2LM+DCVaH+ssiobBWoK AJQw== MIME-Version: 1.0 X-Received: by 10.60.59.5 with SMTP id v5mr1228030oeq.30.1384433228511; Thu, 14 Nov 2013 04:47:08 -0800 (PST) Received: by 10.60.120.37 with HTTP; Thu, 14 Nov 2013 04:47:08 -0800 (PST) In-Reply-To: References: Date: Thu, 14 Nov 2013 14:47:08 +0200 Message-ID: Subject: Re: Uneven write request to regions From: Asaf Mesika To: "user@hbase.apache.org" Content-Type: multipart/alternative; boundary=089e015376b6cd005004eb227a3c X-Virus-Checked: Checked by ClamAV on apache.org --089e015376b6cd005004eb227a3c Content-Type: text/plain; charset=UTF-8 It's from the same table. The thing is that some simply have less data saved in HBase, while others have x50 (max) data. I'm trying to check how people designed their rowkey around it, or had other out-of-the-box solution for it. On Thu, Nov 14, 2013 at 12:06 PM, Jia Wang wrote: > Hi > > Are the regions from the same table? If it was, check your row key design, > you can find the start and end row key for each region, from which you can > know why your request with a specific row key doesn't hit a specified > region. > > If the regions are for different table, you may consider to combine some > cold regions for some tables. > > Thanks > Ramon > > > On Thu, Nov 14, 2013 at 4:59 PM, Asaf Mesika > wrote: > > > Hi, > > > > Have anyone ran into a case where a Region Server is hosting regions, in > > which some regions are getting lots of write requests, and the rest gets > > maye 1/1000 of the rate of write requests? > > > > This leads to a situation where the HLog queue reaches its maxlogs limit > > since, those HLogs containing the puts from slow-write regions are > "stuck" > > until the region will flush. Since those regions barely make it to their > > 256MB flush limit (our configuration), they won't flush. The HLogs queue > > gets bigger due to the fast-write regions, until reaches the stress mode > of > > "We have too many logs". > > This in turn flushes out lots of regions, many of them (about 100) are > > ultra small (10k - 3mb). After 3 rounds like this, the compaction queue > > gets very big....in the end the region server drops dead, and this load > > somehow is moved to another RS, ... > > > > We are running 0.94.7 with 30 RS. > > > > I was wondering how did people handled a mix of slow-write-rate and > > high-write-rate of regions in 1 RS? I was thinking of writing a customer > > load balancer, which keeps tabs on the write request count and memstore > > size, and move all the slow-write regions to 20% of cluster RS dedicated > to > > slow regions, thus releasing the fast write regions to work freely. > > > > Since this issue is hammering our production, we're about to try to > > shut-down the WAL, and risk losing some information in those slow-write > > regions until we can come up with a better solution. > > > > Any advice would be highly appreciated. > > > > Oh - our rowkey is quite normal: > > > > > > Thanks! > > > --089e015376b6cd005004eb227a3c--