Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 382CFE3E9 for ; Sun, 27 Jan 2013 22:25:56 +0000 (UTC) Received: (qmail 87977 invoked by uid 500); 27 Jan 2013 22:25:54 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 87923 invoked by uid 500); 27 Jan 2013 22:25:53 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 87914 invoked by uid 99); 27 Jan 2013 22:25:53 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Jan 2013 22:25:53 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ogdude@googlemail.com designates 209.85.215.43 as permitted sender) Received: from [209.85.215.43] (HELO mail-la0-f43.google.com) (209.85.215.43) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 27 Jan 2013 22:25:48 +0000 Received: by mail-la0-f43.google.com with SMTP id ek20so467853lab.2 for ; Sun, 27 Jan 2013 14:25:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=+F0xolfufTRuqDrUmNcfOXLmdyHZSR3IP7VEYvi3eC0=; b=WbEAzhMd8AfWJ+HnBCNWlFsfglqIm3Di9m2oCwgqRDki2z4Jrc17EtAv74i41wOHHX pikTK6Mr1tDC7H5LVNXhU7U28LzvljIzLIRcwYiUsI2q6FVT8/UUXRDDCmO6zlRbMCen r8q0mi13RhZWgo8IpAeJyquegSP0uLGMGALIcws2BnysJc/ErKvmCr7nCkdULrL2u2tS 6XRxdVvmDSkYQZFNg2J0EhAwGf8laNVN66uvorEWBnuE8i7W2AJhINmDqInWkk+KAD2G M3N7CKBMYoPHyvYHN8UFmYN2PjUum09J6EOHt+KV/zTkmTbXnfqP2zldRqz9d235gYzq bdZg== MIME-Version: 1.0 X-Received: by 10.152.106.5 with SMTP id gq5mr11378587lab.5.1359325527220; Sun, 27 Jan 2013 14:25:27 -0800 (PST) Received: by 10.112.127.6 with HTTP; Sun, 27 Jan 2013 14:25:27 -0800 (PST) In-Reply-To: References: Date: Sun, 27 Jan 2013 23:25:27 +0100 Message-ID: Subject: Re: Rule of thumb: Size of data to send per RPC in a scan From: David Koch To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=f46d04083f2d2f20f704d44ca3de X-Virus-Checked: Checked by ClamAV on apache.org --f46d04083f2d2f20f704d44ca3de Content-Type: text/plain; charset=ISO-8859-1 Hello Ted, Thank you for the link. /David On Sat, Jan 26, 2013 at 1:14 AM, Ted Yu wrote: > Looks like HBASE-2214 'Do HBASE-1996 -- setting size to return in scan > rather than count of rows -- properly' may help you. > But that is only in 0.96 > > Lars H presented some performance numbers in: > HBASE-7008 Set scanner caching to a better default, disable Nagles > where default for "hbase.client.scanner.caching" changed to 100 > > Cheers > > On Fri, Jan 25, 2013 at 3:59 PM, David Koch wrote: > > > Hello, > > > > Is there a rule to determine the best batch/caching combination for > > maximizing scan performance as a function of KV size and (average) number > > of columns per row key? > > > > I have 0.5kb per value (constant), an average of 10 values per row key - > > heavy tailed so some outliers have 100k KVs, around 100million rows in > the > > table. The cluster consists of 30 region servers, 24gb of RAM each, nodes > > are connecting with a 1gbit connection. I am running Map/Reduce jobs on > the > > table, also with 30 task trackers. > > > > I tried: > > cache: 1, no batching -> 14min > > cache 1000, batch 50 -> 11min > > cache 5000, batch 25 -> crash (timeouts) > > cache 2000, batch 25 -> 15min > > > > Job time can vary quite significantly according to whatever activity > > (compactions?) are going on in the background. Also, I cannot probe for > the > > best combination indefinitely since there actual production jobs queued. > I > > did expect a larger speed-up with respect to no caching/batching at all - > > is this unjustified? > > > > In short, I am looking for some tips for making scans in a Map/Reduce > > context faster :-) > > > > Thank you, > > > > /David > > > --f46d04083f2d2f20f704d44ca3de--