Return-Path: X-Original-To: apmail-hbase-user-archive@www.apache.org Delivered-To: apmail-hbase-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 94A63F294 for ; Mon, 1 Apr 2013 17:59:46 +0000 (UTC) Received: (qmail 2249 invoked by uid 500); 1 Apr 2013 17:59:44 -0000 Delivered-To: apmail-hbase-user-archive@hbase.apache.org Received: (qmail 2171 invoked by uid 500); 1 Apr 2013 17:59:44 -0000 Mailing-List: contact user-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hbase.apache.org Delivered-To: mailing list user@hbase.apache.org Received: (qmail 2163 invoked by uid 99); 1 Apr 2013 17:59:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Apr 2013 17:59:44 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of mundra@gmail.com designates 209.85.214.48 as permitted sender) Received: from [209.85.214.48] (HELO mail-bk0-f48.google.com) (209.85.214.48) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 01 Apr 2013 17:59:40 +0000 Received: by mail-bk0-f48.google.com with SMTP id jf3so1030138bkc.35 for ; Mon, 01 Apr 2013 10:59:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type; bh=TTWXGJN7KV8jWVTbud1ITlgv34oAKXprFxWkTcvjpKQ=; b=WgTVeMU66uNvr7/Cxl1lL7MeKeHPVYiSbL47zHYXVF301IkTs47Cx3Tgv8DaqEbKdA xeYEffLhBL3SMYZmCTLF1h4VORwluqYjAaWoviKeb5cGSwiWBWWErB1CcJAUxFgVPkZB CkUU8vZRbyrBRvZQcScrL8bImDFygl5U6YiuGobmAZv1BYesnCZnv4MJh3skZ3kZS11N baABj46NUCJfoGt/u3P+1l8MvqREGynfQ8q/r0sCxSf6MpRFJ58YGrqsAJoIQv/BrgsS 1ILjo58OLUphALqrBg7LtZzVRmA6GyhS26qySCS2emBfdsykKQPZv+lQI9AEyWfksXMe K+HQ== MIME-Version: 1.0 X-Received: by 10.205.47.202 with SMTP id ut10mr5400265bkb.2.1364839158281; Mon, 01 Apr 2013 10:59:18 -0700 (PDT) Received: by 10.204.228.71 with HTTP; Mon, 1 Apr 2013 10:59:18 -0700 (PDT) In-Reply-To: References: <1277447C-C19D-444B-A861-6651106D54B1@gmail.com> Date: Mon, 1 Apr 2013 23:29:18 +0530 Message-ID: Subject: Re: Read thruput From: Vibhav Mundra To: user@hbase.apache.org Content-Type: multipart/alternative; boundary=bcaec529933f34b37b04d9506155 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec529933f34b37b04d9506155 Content-Type: text/plain; charset=ISO-8859-1 What is the general read-thru put that one gets when using Hbase. I am not to able to achieve more than 3000/secs with a timeout of 50 millisecs. In this case also there is 10% of them are timing-out. -Vibhav On Mon, Apr 1, 2013 at 11:20 PM, Vibhav Mundra wrote: > yes, I have changes the BLOCK CACHE % to 0.35. > > -Vibhav > > > On Mon, Apr 1, 2013 at 10:20 PM, Ted Yu wrote: > >> I was aware of that discussion which was about MAX_FILESIZE and BLOCKSIZE >> >> My suggestion was about block cache percentage. >> >> Cheers >> >> >> On Mon, Apr 1, 2013 at 4:57 AM, Vibhav Mundra wrote: >> >> > I have used the following site: >> > http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow >> > >> > to lessen the value of block cache. >> > >> > -Vibhav >> > >> > >> > On Mon, Apr 1, 2013 at 4:23 PM, Ted wrote: >> > >> > > Can you increase block cache size ? >> > > >> > > What version of hbase are you using ? >> > > >> > > Thanks >> > > >> > > On Apr 1, 2013, at 3:47 AM, Vibhav Mundra wrote: >> > > >> > > > The typical size of each of my row is less than 1KB. >> > > > >> > > > Regarding the memory, I have used 8GB for Hbase regionservers and 4 >> GB >> > > for >> > > > datanodes and I dont see them completely used. So I ruled out the GC >> > > aspect. >> > > > >> > > > In case u still believe that GC is an issue, I will upload the gc >> logs. >> > > > >> > > > -Vibhav >> > > > >> > > > >> > > > On Mon, Apr 1, 2013 at 3:46 PM, ramkrishna vasudevan < >> > > > ramkrishna.s.vasudevan@gmail.com> wrote: >> > > > >> > > >> Hi >> > > >> >> > > >> How big is your row? Are they wider rows and what would be the >> size >> > of >> > > >> every cell? >> > > >> How many read threads are getting used? >> > > >> >> > > >> >> > > >> Were you able to take a thread dump when this was happening? Have >> you >> > > seen >> > > >> the GC log? >> > > >> May be need some more info before we can think of the problem. >> > > >> >> > > >> Regards >> > > >> Ram >> > > >> >> > > >> >> > > >> On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra >> > wrote: >> > > >> >> > > >>> Hi All, >> > > >>> >> > > >>> I am trying to use Hbase for real-time data retrieval with a >> timeout >> > of >> > > >> 50 >> > > >>> ms. >> > > >>> >> > > >>> I am using 2 machines as datanode and regionservers, >> > > >>> and one machine as a master for hadoop and Hbase. >> > > >>> >> > > >>> But I am able to fire only 3000 queries per sec and 10% of them >> are >> > > >> timing >> > > >>> out. >> > > >>> The database has 60 million rows. >> > > >>> >> > > >>> Are these figure okie, or I am missing something. >> > > >>> I have used the scanner caching to be equal to one, because for >> each >> > > time >> > > >>> we are fetching a single row only. >> > > >>> >> > > >>> Here are the various configurations: >> > > >>> >> > > >>> *Our schema >> > > >>> *{NAME => 'mytable', FAMILIES => [{NAME => 'cf', >> DATA_BLOCK_ENCODING >> > => >> > > >>> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', >> > COMPRESSION >> > > => >> > > >>> 'GZ', VERSIONS => '1', TTL => '2147483647', MIN_VERSIONS => '0', >> KEE >> > > >>> P_DELETED_CELLS => 'false', BLOCKSIZE => '8192', ENCODE_ON_DISK => >> > > >> 'true', >> > > >>> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]} >> > > >>> >> > > >>> *Configuration* >> > > >>> 1 Machine having both hbase and hadoop master >> > > >>> 2 machines having both region server node and datanode >> > > >>> total 285 region servers >> > > >>> >> > > >>> *Machine Level Optimizations:* >> > > >>> a)No of file descriptors is 1000000(ulimit -n gives 1000000) >> > > >>> b)Increase the read-ahead value to 4096 >> > > >>> c)Added noatime,nodiratime to the disks >> > > >>> >> > > >>> *Hadoop Optimizations:* >> > > >>> dfs.datanode.max.xcievers = 4096 >> > > >>> dfs.block.size = 33554432 >> > > >>> dfs.datanode.handler.count = 256 >> > > >>> io.file.buffer.size = 65536 >> > > >>> hadoop data is split on 4 directories, so that different disks are >> > > being >> > > >>> accessed >> > > >>> >> > > >>> *Hbase Optimizations*: >> > > >>> >> > > >>> hbase.client.scanner.caching=1 #We have specifcally added this, >> as >> > we >> > > >>> return always one row. >> > > >>> hbase.regionserver.handler.count=3200 >> > > >>> hfile.block.cache.size=0.35 >> > > >>> hbase.hregion.memstore.mslab.enabled=true >> > > >>> hfile.min.blocksize.size=16384 >> > > >>> hfile.min.blocksize.size=4 >> > > >>> hbase.hstore.blockingStoreFiles=200 >> > > >>> hbase.regionserver.optionallogflushinterval=60000 >> > > >>> hbase.hregion.majorcompaction=0 >> > > >>> hbase.hstore.compaction.max=100 >> > > >>> hbase.hstore.compactionThreshold=100 >> > > >>> >> > > >>> *Hbase-GC >> > > >>> *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC >> > -XX:+CMSParallelRemarkEnabled >> > > >>> -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16 >> > > >>> *Hadoop-GC* >> > > >>> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC >> > > >>> >> > > >>> -Vibhav >> > > >> >> > > >> > >> > > --bcaec529933f34b37b04d9506155--