Return-Path: Delivered-To: apmail-hbase-user-archive@www.apache.org Received: (qmail 52912 invoked from network); 11 Nov 2010 13:08:58 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 11 Nov 2010 13:08:58 -0000 Received: (qmail 6034 invoked by uid 500); 11 Nov 2010 13:09:27 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 5886 invoked by uid 500); 11 Nov 2010 13:09:27 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 5877 invoked by uid 99); 11 Nov 2010 13:09:26 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Nov 2010 13:09:26 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [93.94.224.194] (HELO owa.exchange-login.net) (93.94.224.194) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Nov 2010 13:09:19 +0000 Received: from HC1.hosted.exchange-login.net (93.94.224.200) by edge1.hosted.exchange-login.net (93.94.224.194) with Microsoft SMTP Server (TLS) id 14.0.702.0; Thu, 11 Nov 2010 14:09:18 +0100 Received: from MBX1.hosted.exchange-login.net ([fe80::a957:8775:7bf4:6581]) by hc1.hosted.exchange-login.net ([2002:5d5e:e0c8::5d5e:e0c8]) with mapi; Thu, 11 Nov 2010 14:08:57 +0100 From: Friso van Vollenhoven To: "" Subject: Re: scan performance improvement Thread-Topic: scan performance improvement Thread-Index: AQHLgZHRIu5CGQt3ekWSzNDBdiGBYpNsE7QAgAAG74CAABSnAA== Date: Thu, 11 Nov 2010 13:08:56 +0000 Message-ID: References: <71A5D7B2-2104-44B9-9185-3D7EFBDACEB3@xebia.com> In-Reply-To: Accept-Language: nl-NL, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: text/plain; charset="us-ascii" Content-ID: Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Not that block size (that's the HDFS one), but the HBase block size. You se= t it at table creation or it uses the default of 64K. The description of hbase.client.scanner.caching says: Number of rows that will be fetched when calling next on a scanner if it is not served from memory. Higher caching values will enable faster scanners but will eat up more memory and some calls of next may take longer and longer times when the cache is empty. That means that it will pre-fetch that number of rows, if the next row does= not come from memory. So if your rows are small enough to fit 100 of them = in one block, it doesn't matter whether you pre-fetch 1, 50 or 99, because = it will only go to disk when it exhausts the whole block, which sticks in b= lock cache. So, it will still fetch the same amount of data from disk every= time. If you increase the number to a value that is certain to load multip= le blocks at a time from disk, it will increase performance. On 11 nov 2010, at 12:55, Oleg Ruchovets wrote: > Yes , I thought about large number , so you said it depends on block size= . > Good point. >=20 > I have one recored ~ 4k , > block size is: >=20 > > dfs.block.size > 268435456 > HDFS blocksize of 256MB for large file-systems. > > >=20 > what is the number that I have choose? Assuming > I am afraid that using number which is equal one block brings to > socketTimeOutException? Am I write? >=20 > Thanks Oleg. >=20 >=20 >=20 >=20 > On Thu, Nov 11, 2010 at 1:30 PM, Friso van Vollenhoven < > fvanvollenhoven@xebia.com> wrote: >=20 >> How small is small? If it is bytes, then setting the value to 50 is not = so >> much different from 1, I suppose. If 50 rows fit in one block, it will j= ust >> fetch one block whether the setting is 1 or 50. You might want to try a >> larger value. It should be fine if the records are small and you need th= em >> all on the client side anyway. >>=20 >> It also depends on the block size, of course. When you only ever do full >> scans on a table and little random access, you might want to increase th= at. >>=20 >> Friso >>=20 >>=20 >>=20 >>=20 >> On 11 nov 2010, at 12:15, Oleg Ruchovets wrote: >>=20 >>> Hi , >>> To improve client performance I changed >>> hbase.client.scanner.caching from 1 to 50. >>> After running client with new value( hbase.client.scanner.caching from = =3D >> 50 >>> ) it didn't improve execution time at all. >>>=20 >>> I have ~ 9 million small records. >>> I have to do full scan , so it brings all 9 million records to client = . >>> My assumption -- this change have to bring significant improvement , bu= t >> it >>> is not. >>>=20 >>> Additional Information. >>> I scan table which has 100 regions >>> 5 server >>> 20 map >>> 4 concurrent map >>> Scan process takes 5.5 - 6 hours. As for me it is too much time? Am I >> write? >>> and how can I improve it >>>=20 >>>=20 >>> I changed the value in all hbase-site.xml files and restart hbase. >>>=20 >>> Any suggestions. >>=20 >>=20