Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (nike.apache.org: domain of
 ramkrishna.s.vasudevan@gmail.com designates 209.85.128.44 as permitted
 sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAMpwW3peAin8yqwB=exP9fgq=k7ygbLpPmdMXmRYbdfYa4RuzA@mail.gmail.com>
References: 
 <CAMpwW3peAin8yqwB=exP9fgq=k7ygbLpPmdMXmRYbdfYa4RuzA@mail.gmail.com>
Date: Mon, 1 Apr 2013 15:46:51 +0530
Message-ID: 
 <CAAT7Mkr453pyyu-xnih=BEhzzfbOQhsWonS0-NzqNbrGFDsPpg@mail.gmail.com>
Subject: Re: Read thruput
From: ramkrishna vasudevan <ramkrishna.s.vasudevan@gmail.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=047d7bd74da267090a04d949eb31

--047d7bd74da267090a04d949eb31
Content-Type: text/plain; charset=ISO-8859-1

Hi

How big is your row?  Are they wider rows and what would be the size of
every cell?
How many read threads are getting used?


Were you able to take a thread dump when this was happening?  Have you seen
the GC log?
May be need some more info before we can think of the problem.

Regards
Ram


On Mon, Apr 1, 2013 at 3:39 PM, Vibhav Mundra <mundra@gmail.com> wrote:

> Hi All,
>
> I am trying to use Hbase for real-time data retrieval with a timeout of 50
> ms.
>
> I am using 2 machines as datanode and regionservers,
> and one machine as a master for hadoop and Hbase.
>
> But I am able to fire only 3000 queries per sec and 10% of them are timing
> out.
> The database has 60 million rows.
>
> Are these figure okie, or I am missing something.
> I have used the scanner caching to be equal to one, because for each time
> we are fetching a single row only.
>
> Here are the various configurations:
>
> *Our schema
> *{NAME => 'mytable', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING =>
> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', COMPRESSION =>
> 'GZ', VERSIONS => '1', TTL => '2147483647', MIN_VERSIONS => '0', KEE
> P_DELETED_CELLS => 'false', BLOCKSIZE => '8192', ENCODE_ON_DISK => 'true',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
>
> *Configuration*
> 1 Machine having both hbase and hadoop master
> 2 machines having both region server node and datanode
> total 285 region servers
>
> *Machine Level Optimizations:*
> a)No of file descriptors is 1000000(ulimit -n gives 1000000)
> b)Increase the read-ahead value to 4096
> c)Added noatime,nodiratime to the disks
>
> *Hadoop Optimizations:*
> dfs.datanode.max.xcievers = 4096
> dfs.block.size = 33554432
> dfs.datanode.handler.count = 256
> io.file.buffer.size = 65536
> hadoop data is split on 4 directories, so that different disks are being
> accessed
>
> *Hbase Optimizations*:
>
> hbase.client.scanner.caching=1  #We have specifcally added this, as we
> return always one row.
> hbase.regionserver.handler.count=3200
> hfile.block.cache.size=0.35
> hbase.hregion.memstore.mslab.enabled=true
> hfile.min.blocksize.size=16384
> hfile.min.blocksize.size=4
> hbase.hstore.blockingStoreFiles=200
> hbase.regionserver.optionallogflushinterval=60000
> hbase.hregion.majorcompaction=0
> hbase.hstore.compaction.max=100
> hbase.hstore.compactionThreshold=100
>
> *Hbase-GC
> *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
> *Hadoop-GC*
> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>
> -Vibhav
>

--047d7bd74da267090a04d949eb31--