Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 58E13F7FE for ; Mon, 15 Apr 2013 17:03:51 +0000 (UTC) Received: (qmail 83207 invoked by uid 500); 15 Apr 2013 17:03:49 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 83140 invoked by uid 500); 15 Apr 2013 17:03:49 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 83131 invoked by uid 99); 15 Apr 2013 17:03:49 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Apr 2013 17:03:49 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of yuzhihong@gmail.com designates 209.85.217.177 as permitted sender) Received: from [209.85.217.177] (HELO mail-lb0-f177.google.com) (209.85.217.177) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 15 Apr 2013 17:03:45 +0000 Received: by mail-lb0-f177.google.com with SMTP id r10so4671973lbi.22 for ; Mon, 15 Apr 2013 10:03:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=V0ZmmJhQNwp+ELSMBZNiMk9oIFSQbEN0mOXkEPNpKqc=; b=HH5nwvc1IelR5181+JBLHviEuKR8dHzrpwtWjBCuKijCPjqO/CfJOV4R9SUtxZvBdG OFRI3Ku4iGiKXJDWNqNA3hC1cFRmBC8C8YZ89OY9hekHwfYAIxjU/ZZTZRRV2T5hrX/Y tUaPzKbdpXFZGcfSL4cBGjremZqLpJHfkfw7puC10Pa/Ky4iyf9Q5qvmiDipkJyM412m LvmBuId0jsXN+P+qCRybJmkWAFM3uHQ/7Xy0Gbd98Tsk0Sih89SM5d2dNDZZuFKxijNc OmHaM/jDCPR9xsqHdJzcqU6+vPevtoZfYuCKz6oMkgatYiDk2vyA7OLblLJFXe6YO4U1 La4A== MIME-Version: 1.0 X-Received: by 10.112.135.166 with SMTP id pt6mr10764790lbb.127.1366045403028; Mon, 15 Apr 2013 10:03:23 -0700 (PDT) Received: by 10.112.5.101 with HTTP; Mon, 15 Apr 2013 10:03:22 -0700 (PDT) In-Reply-To: References: Date: Mon, 15 Apr 2013 10:03:22 -0700 Message-ID: Subject: =?GB2312?Q?Re=3A_=B4=F0=B8=B4=3A_HBase_random_read_performance?= From: Ted Yu To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=089e0115fd24fea43c04da693a2c X-Virus-Checked: Checked by ClamAV on apache.org --089e0115fd24fea43c04da693a2c Content-Type: text/plain; charset=GB2312 Content-Transfer-Encoding: quoted-printable This is a related JIRA which should provide noticeable speed up: HBASE-1935 Scan in parallel Cheers On Mon, Apr 15, 2013 at 7:13 AM, Ted Yu wrote: > I looked > at src/main/java/org/apache/hadoop/hbase/client/HConnectionManager.java i= n > 0.94 > > In processBatchCallback(), starting line 1538, > > // step 1: break up into regionserver-sized chunks and build the > data structs > Map> actionsByServer =3D > new HashMap>(); > for (int i =3D 0; i < workingList.size(); i++) { > > So we do group individual action by server. > > FYI > > On Mon, Apr 15, 2013 at 6:30 AM, Ted Yu wrote: > >> Doug made a good point. >> >> Take a look at the performance gain for parallel scan (bottom chart >> compared to top chart): >> https://issues.apache.org/jira/secure/attachment/12578083/FDencode.png >> >> See >> https://issues.apache.org/jira/browse/HBASE-8316?focusedCommentId=3D1362= 8300&page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:comment-tabpane= l#comment-13628300for explanation of the two methods. >> >> Cheers >> >> On Mon, Apr 15, 2013 at 6:21 AM, Doug Meil > > wrote: >> >>> >>> Hi there, regarding this... >>> >>> > We are passing random 10000 row-keys as input, while HBase is taking >>> around >>> > 17 secs to return 10000 records. >>> >>> >>> =A1=AD. Given that you are generating 10,000 random keys, your multi-g= et is >>> very likely hitting all 5 nodes of your cluster. >>> >>> >>> Historically, multi-Get used to first sort the requests by RS and then >>> *serially* go the RS to process the multi-Get. I'm not sure of the >>> current (0.94.x) behavior if it multi-threads or not. >>> >>> One thing you might want to consider is confirming that client behavior= , >>> and if it's not multi-threading then perform a test that does the same = RS >>> sorting via... >>> >>> >>> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/HTable.h= tml# >>> getRegionLocation%28byte[ >>> ]%29 >>> >>> =A1=AD. and then spin up your own threads (one per target RS) and see w= hat >>> happens. >>> >>> >>> >>> On 4/15/13 9:04 AM, "Ankit Jain" wrote: >>> >>> >Hi Liang, >>> > >>> >Thanks Liang for reply.. >>> > >>> >Ans1: >>> >I tried by using HFile block size of 32 KB and bloom filter is enabled= . >>> >The >>> >random read performance is 10000 records in 23 secs. >>> > >>> >Ans2: >>> >We are retrieving all the 10000 rows in one call. >>> > >>> >Ans3: >>> >Disk detai: >>> >Model Number: ST2000DM001-1CH164 >>> >Serial Number: Z1E276YF >>> > >>> >Please suggest some more optimization >>> > >>> >Thanks, >>> >Ankit Jain >>> > >>> >On Mon, Apr 15, 2013 at 5:11 PM, =D0=BB=C1=BC wr= ote: >>> > >>> >> First, it's probably helpless to set block size to 4KB, please refer >>> to >>> >> the beginning of HFile.java: >>> >> >>> >> Smaller blocks are good >>> >> * for random access, but require more memory to hold the block inde= x, >>> >>and >>> >> may >>> >> * be slower to create (because we must flush the compressor stream = at >>> >>the >>> >> * conclusion of each data block, which leads to an FS I/O flush). >>> >> Further, due >>> >> * to the internal caching in Compression codec, the smallest possib= le >>> >> block >>> >> * size would be around 20KB-30KB. >>> >> >>> >> Second, is it a single-thread test client or multi-threads? we >>> couldn't >>> >> expect too much if the requests are one by one. >>> >> >>> >> Third, could you provide more info about your DN disk numbers and I= O >>> >> utils ? >>> >> >>> >> Thanks, >>> >> Liang >>> >> ________________________________________ >>> >> =B7=A2=BC=FE=C8=CB: Ankit Jain [ankitjaincs06@gmail.com] >>> >> =B7=A2=CB=CD=CA=B1=BC=E4: 2013=C4=EA4=D4=C215=C8=D5 18:53 >>> >> =CA=D5=BC=FE=C8=CB: user@hbase.apache.org >>> >> =D6=F7=CC=E2: Re: HBase random read performance >>> >> >>> >> Hi Anoop, >>> >> >>> >> Thanks for reply.. >>> >> >>> >> I tried by setting Hfile block size 4KB and also enabled the bloom >>> >> filter(ROW). The maximum read performance that I was able to achieve >>> is >>> >> 10000 records in 14 secs (size of record is 1.6KB). >>> >> >>> >> Please suggest some tuning.. >>> >> >>> >> Thanks, >>> >> Ankit Jain >>> >> >>> >> >>> >> >>> >> On Mon, Apr 15, 2013 at 4:12 PM, Rishabh Agrawal < >>> >> rishabh.agrawal@impetus.co.in> wrote: >>> >> >>> >> > Interesting. Can you explain why this happens? >>> >> > >>> >> > -----Original Message----- >>> >> > From: Anoop Sam John [mailto:anoopsj@huawei.com] >>> >> > Sent: Monday, April 15, 2013 3:47 PM >>> >> > To: user@hbase.apache.org >>> >> > Subject: RE: HBase random read performance >>> >> > >>> >> > Ankit >>> >> > I guess you might be having default HFile block >>> size >>> >> > which is 64KB. >>> >> > For random gets a lower value will be better. Try will some thing >>> like >>> >> 8KB >>> >> > and check the latency? >>> >> > >>> >> > Ya ofcourse blooms can help (if major compaction was not done at t= he >>> >>time >>> >> > of testing) >>> >> > >>> >> > -Anoop- >>> >> > ________________________________________ >>> >> > From: Ankit Jain [ankitjaincs06@gmail.com] >>> >> > Sent: Saturday, April 13, 2013 11:01 AM >>> >> > To: user@hbase.apache.org >>> >> > Subject: HBase random read performance >>> >> > >>> >> > Hi All, >>> >> > >>> >> > We are using HBase 0.94.5 and Hadoop 1.0.4. >>> >> > >>> >> > We have HBase cluster of 5 nodes(5 regionservers and 1 master node= ). >>> >>Each >>> >> > regionserver has 8 GB RAM. >>> >> > >>> >> > We have loaded 25 millions records in HBase table, regions are >>> >>pre-split >>> >> > into 16 regions and all the regions are equally loaded. >>> >> > >>> >> > We are getting very low random read performance while performing >>> multi >>> >> get >>> >> > from HBase. >>> >> > >>> >> > We are passing random 10000 row-keys as input, while HBase is taki= ng >>> >> around >>> >> > 17 secs to return 10000 records. >>> >> > >>> >> > Please suggest some tuning to increase HBase read performance. >>> >> > >>> >> > Thanks, >>> >> > Ankit Jain >>> >> > iLabs >>> >> > >>> >> > >>> >> > >>> >> > -- >>> >> > Thanks, >>> >> > Ankit Jain >>> >> > >>> >> > ________________________________ >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > NOTE: This message may contain information that is confidential, >>> >> > proprietary, privileged or otherwise protected by law. The message >>> is >>> >> > intended solely for the named addressee. If received in error, >>> please >>> >> > destroy and notify the sender. Any use of this email is prohibited >>> >>when >>> >> > received in error. Impetus does not represent, warrant and/or >>> >>guarantee, >>> >> > that the integrity of this communication has been maintained nor >>> that >>> >>the >>> >> > communication is free of errors, virus, interception or >>> interference. >>> >> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Thanks, >>> >> Ankit Jain >>> >> >>> > >>> > >>> > >>> >-- >>> >Thanks, >>> >Ankit Jain >>> >>> >> > --089e0115fd24fea43c04da693a2c--