Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
Received-SPF: pass (athena.apache.org: domain of azuryyyu@gmail.com designates
 209.85.223.174 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CAMpwW3peAin8yqwB=exP9fgq=k7ygbLpPmdMXmRYbdfYa4RuzA@mail.gmail.com>
References: 
 <CAMpwW3peAin8yqwB=exP9fgq=k7ygbLpPmdMXmRYbdfYa4RuzA@mail.gmail.com>
Date: Mon, 1 Apr 2013 19:33:37 +0800
Message-ID: 
 <CALr1C9q-GTAbQ9DjYdQZdowdtw=1w=4k1RQEvcrSu-t5DHFBwQ@mail.gmail.com>
Subject: Re: Read thruput
From: Azuryy Yu <azuryyyu@gmail.com>
To: user@hbase.apache.org
Content-Type: multipart/alternative; boundary=047d7bd76ae6f2a1e704d94afd48

--047d7bd76ae6f2a1e704d94afd48
Content-Type: text/plain; charset=ISO-8859-1

can you output GC log? CMS GC should be optimized futher. please find it on
official site. another, use vmstat monitor page rate during query.
On Apr 1, 2013 6:09 PM, "Vibhav Mundra" <mundra@gmail.com> wrote:

> Hi All,
>
> I am trying to use Hbase for real-time data retrieval with a timeout of 50
> ms.
>
> I am using 2 machines as datanode and regionservers,
> and one machine as a master for hadoop and Hbase.
>
> But I am able to fire only 3000 queries per sec and 10% of them are timing
> out.
> The database has 60 million rows.
>
> Are these figure okie, or I am missing something.
> I have used the scanner caching to be equal to one, because for each time
> we are fetching a single row only.
>
> Here are the various configurations:
>
> *Our schema
> *{NAME => 'mytable', FAMILIES => [{NAME => 'cf', DATA_BLOCK_ENCODING =>
> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE => '0', COMPRESSION =>
> 'GZ', VERSIONS => '1', TTL => '2147483647', MIN_VERSIONS => '0', KEE
> P_DELETED_CELLS => 'false', BLOCKSIZE => '8192', ENCODE_ON_DISK => 'true',
> IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
>
> *Configuration*
> 1 Machine having both hbase and hadoop master
> 2 machines having both region server node and datanode
> total 285 region servers
>
> *Machine Level Optimizations:*
> a)No of file descriptors is 1000000(ulimit -n gives 1000000)
> b)Increase the read-ahead value to 4096
> c)Added noatime,nodiratime to the disks
>
> *Hadoop Optimizations:*
> dfs.datanode.max.xcievers = 4096
> dfs.block.size = 33554432
> dfs.datanode.handler.count = 256
> io.file.buffer.size = 65536
> hadoop data is split on 4 directories, so that different disks are being
> accessed
>
> *Hbase Optimizations*:
>
> hbase.client.scanner.caching=1  #We have specifcally added this, as we
> return always one row.
> hbase.regionserver.handler.count=3200
> hfile.block.cache.size=0.35
> hbase.hregion.memstore.mslab.enabled=true
> hfile.min.blocksize.size=16384
> hfile.min.blocksize.size=4
> hbase.hstore.blockingStoreFiles=200
> hbase.regionserver.optionallogflushinterval=60000
> hbase.hregion.majorcompaction=0
> hbase.hstore.compaction.max=100
> hbase.hstore.compactionThreshold=100
>
> *Hbase-GC
> *-XX:+UseConcMarkSweepGC -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled
> -XX:SurvivorRatio=20 -XX:ParallelGCThreads=16
> *Hadoop-GC*
> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC
>
> -Vibhav
>

--047d7bd76ae6f2a1e704d94afd48--