hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ramu M S <ramu.ma...@gmail.com>
Subject Re: HBase Random Read latency > 100ms
Date Tue, 08 Oct 2013 00:25:40 GMT
Vladimir,

Yes. I am fully aware of the HDD limitation and wrong configurations wrt
RAID.
Unfortunately, the hardware is leased from others for this work and I
wasn't consulted to decide the h/w specification for the tests that I am
doing now. Even the RAID cannot be turned off or set to RAID-0

Production system is according to the Hadoop needs (100 Nodes with 16 Core
CPU, 192 GB RAM, 24 X 600GB SAS Drives, RAID cannot be completely turned
off, so we are creating 1 Virtual Disk containing only 1 Physical Disk and
the VD RAID level set to* *RAID-0). These systems are still not available. If
you have any suggestion on the production setup, I will be glad to hear.

Also, as pointed out earlier, we are planning to use HBase also as an in
memory KV store to access the latest data.
That's why RAM was considered huge in this configuration. But looks like we
would run into more problems than any gains from this.

Keeping that aside, I was trying to get the maximum out of the current
cluster or as you said Is 500-1000 OPS the max I could get out of this
setup?

Regards,
Ramu


On Tue, Oct 8, 2013 at 3:02 AM, Vladimir Rodionov
<vrodionov@carrieriq.com>wrote:

> Ramu, your HBase configuration (128GB of heap) is far from optimal.
> Nobody runs HBase with that amount of heap to my best knowledge.
> 32GB of RAM is the usual upper limit. We run 8-12GB in production.
>
> What else, your IO capacity is VERY low. 2 SATA drives in RAID 1 for
> mostly random reads load?
> You should have 8, better 12-16 drives per server. Forget about RAID. You
> have HDFS.
>
> Block cache in your case does not help much , as since your read
> amplification is at least x20 (16KB block and 724 B read) - its just waste
> RAM (heap). In your case you do not need LARGE heap and LARGE block cache.
>
> I advise you reconsidering your hardware spec, applying all optimizations
> mentioned already in this thread and lowering your expectations.
>
> With a right hardware you will be able to get 500-1000 truly random reads
> per server.
>
> Best regards,
> Vladimir Rodionov
> Principal Platform Engineer
> Carrier IQ, www.carrieriq.com
> e-mail: vrodionov@carrieriq.com
>
> ________________________________________
> From: Ramu M S [ramu.malur@gmail.com]
> Sent: Monday, October 07, 2013 5:23 AM
> To: user@hbase.apache.org
> Subject: Re: HBase Random Read latency > 100ms
>
> Hi Bharath,
>
> I am little confused about the metrics displayed by Cloudera. Even when
> there are no oeprations, the gc_time metric is showing 2s constant in the
> graph. Is this the CMS gc_time (in that case no JVm pause) or the GC pause.
>
> GC timings reported earlier is the average taken for gc_time metric for all
> region servers.
>
> Regards,
> Ramu
>
>
> On Mon, Oct 7, 2013 at 9:10 PM, Ramu M S <ramu.malur@gmail.com> wrote:
>
> > Jean,
> >
> > Yes. It is 2 drives.
> >
> > - Ramu
> >
> >
> > On Mon, Oct 7, 2013 at 8:45 PM, Jean-Marc Spaggiari <
> > jean-marc@spaggiari.org> wrote:
> >
> >> Quick questionon the disk side.
> >>
> >> When you say:
> >> 800 GB SATA (7200 RPM) Disk
> >> Is it 1x800GB? It's raid 1, so might be 2 drives? What's the
> >> configuration?
> >>
> >> JM
> >>
> >>
> >> 2013/10/7 Ramu M S <ramu.malur@gmail.com>
> >>
> >> > Lars, Bharath,
> >> >
> >> > Compression is disabled for the table. This was not intended from the
> >> > evaluation.
> >> > I forgot to mention that during table creation. I will enable snappy
> >> and do
> >> > major compaction again.
> >> >
> >> > Please suggest other options to try out and also suggestions for the
> >> > previous questions.
> >> >
> >> > Thanks,
> >> > Ramu
> >> >
> >> >
> >> > On Mon, Oct 7, 2013 at 6:35 PM, Ramu M S <ramu.malur@gmail.com>
> wrote:
> >> >
> >> > > Bharath,
> >> > >
> >> > > I was about to report this. Yes indeed there is too much of GC time.
> >> > > Just verified the GC time using Cloudera Manager statistics(Every
> >> minute
> >> > > update).
> >> > >
> >> > > For each Region Server,
> >> > >  - During Read: Graph shows 2s constant.
> >> > >  - During Compaction: Graph starts with 7s and goes as high as 20s
> >> during
> >> > > end.
> >> > >
> >> > > Few more questions,
> >> > > 1. For the current evaluation, since the reads are completely random
> >> and
> >> > I
> >> > > don't expect to read same data again can I set the Heap to the
> >> default 1
> >> > GB
> >> > > ?
> >> > >
> >> > > 2. Can I completely turn off BLOCK CACHE for this table?
> >> > >     http://hbase.apache.org/book/regionserver.arch.html recommends
> >> that
> >> > > for Randm reads.
> >> > >
> >> > > 3. But in the next phase of evaluation, We are interested to use
> >> HBase as
> >> > > In-memory KV DB by having the latest data in RAM (To the tune of
> >> around
> >> > 128
> >> > > GB in each RS, we are setting up 50-100 Node Cluster). I am very
> >> curious
> >> > to
> >> > > hear any suggestions in this regard.
> >> > >
> >> > > Regards,
> >> > > Ramu
> >> > >
> >> > >
> >> > > On Mon, Oct 7, 2013 at 5:50 PM, Bharath Vissapragada <
> >> > > bharathv@cloudera.com> wrote:
> >> > >
> >> > >> Hi Ramu,
> >> > >>
> >> > >> Thanks for reporting the results back. Just curious if you are
> >> hitting
> >> > any
> >> > >> big GC pauses due to block cache churn on such large heap. Do
you
> see
> >> > it ?
> >> > >>
> >> > >> - Bharath
> >> > >>
> >> > >>
> >> > >> On Mon, Oct 7, 2013 at 1:42 PM, Ramu M S <ramu.malur@gmail.com>
> >> wrote:
> >> > >>
> >> > >> > Lars,
> >> > >> >
> >> > >> > After changing the BLOCKSIZE to 16KB, the latency has reduced
a
> >> > little.
> >> > >> Now
> >> > >> > the average is around 75ms.
> >> > >> > Overall throughput (I am using 40 Clients to fetch records)
is
> >> around
> >> > 1K
> >> > >> > OPS.
> >> > >> >
> >> > >> > After compaction hdfsBlocksLocalityIndex is
> >> 91,88,78,90,99,82,94,97 in
> >> > >> my 8
> >> > >> > RS respectively.
> >> > >> >
> >> > >> > Thanks,
> >> > >> > Ramu
> >> > >> >
> >> > >> >
> >> > >> > On Mon, Oct 7, 2013 at 3:51 PM, Ramu M S <ramu.malur@gmail.com>
> >> > wrote:
> >> > >> >
> >> > >> > > Thanks Lars.
> >> > >> > >
> >> > >> > > I have changed the BLOCKSIZE to 16KB and triggered a
major
> >> > >> compaction. I
> >> > >> > > will report my results once it is done.
> >> > >> > >
> >> > >> > > - Ramu
> >> > >> > >
> >> > >> > >
> >> > >> > > On Mon, Oct 7, 2013 at 3:21 PM, lars hofhansl <
> larsh@apache.org>
> >> > >> wrote:
> >> > >> > >
> >> > >> > >> First of: 128gb heap per RegionServer. Wow.I'd be
interested
> to
> >> > hear
> >> > >> > your
> >> > >> > >> experience with such a large heap for your RS. It's
definitely
> >> big
> >> > >> > enough.
> >> > >> > >>
> >> > >> > >>
> >> > >> > >> It's interesting hat 100gb do fit into the aggregate
cache (of
> >> > >> 8x32gb),
> >> > >> > >> while 1.8tb do not.
> >> > >> > >> Looks like ~70% of the read request would need to
bring in a
> >> 64kb
> >> > >> block
> >> > >> > >> in order to read 724 bytes.
> >> > >> > >>
> >> > >> > >> Should that take 100ms? No. Something's still amiss.
> >> > >> > >>
> >> > >> > >> Smaller blocks might help (you'd need to bring in
4, 8, or
> maybe
> >> > 16k
> >> > >> to
> >> > >> > >> read the small row). You would need to issue a major
> compaction
> >> for
> >> > >> > that to
> >> > >> > >> take effect.
> >> > >> > >> Maybe try 16k blocks. If that speeds up your random
gets we
> know
> >> > >> where
> >> > >> > to
> >> > >> > >> look next... At the disk IO.
> >> > >> > >>
> >> > >> > >>
> >> > >> > >> -- Lars
> >> > >> > >>
> >> > >> > >>
> >> > >> > >>
> >> > >> > >> ________________________________
> >> > >> > >>  From: Ramu M S <ramu.malur@gmail.com>
> >> > >> > >> To: user@hbase.apache.org; lars hofhansl <larsh@apache.org>
> >> > >> > >> Sent: Sunday, October 6, 2013 11:05 PM
> >> > >> > >> Subject: Re: HBase Random Read latency > 100ms
> >> > >> > >>
> >> > >> > >>
> >> > >> > >> Lars,
> >> > >> > >>
> >> > >> > >> In one of your old posts, you had mentioned that
lowering the
> >> > >> BLOCKSIZE
> >> > >> > is
> >> > >> > >> good for random reads (of course with increased
size for Block
> >> > >> Indexes).
> >> > >> > >>
> >> > >> > >> Post is at
> >> > >> > http://grokbase.com/t/hbase/user/11bat80x7m/row-get-very-slow
> >> > >> > >>
> >> > >> > >> Will that help in my tests? Should I give it a try?
If I alter
> >> my
> >> > >> table,
> >> > >> > >> should I trigger a major compaction again for this
to take
> >> effect?
> >> > >> > >>
> >> > >> > >> Thanks,
> >> > >> > >> Ramu
> >> > >> > >>
> >> > >> > >>
> >> > >> > >>
> >> > >> > >> On Mon, Oct 7, 2013 at 2:44 PM, Ramu M S <
> ramu.malur@gmail.com>
> >> > >> wrote:
> >> > >> > >>
> >> > >> > >> > Sorry BLOCKSIZE was wrong in my earlier post,
it is the
> >> default
> >> > 64
> >> > >> KB.
> >> > >> > >> >
> >> > >> > >> > {NAME => 'usertable', FAMILIES => [{NAME
=> 'cf',
> >> > >> DATA_BLOCK_ENCODING
> >> > >> > =>
> >> > >> > >> > 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE
=> '0',
> >> > >> VERSIONS =>
> >> > >> > >> '1',
> >> > >> > >> > COMPRESSION => 'NONE', MIN_VERSIONS =>
'0', TTL =>
> >> '2147483647',
> >> > >> > >> > KEEP_DELETED_CELLS => 'false', BLOCKSIZE
=> '65536',
> >> IN_MEMORY =>
> >> > >> > >> 'false',
> >> > >> > >> > ENCODE_ON_DISK => 'true', BLOCKCACHE =>
'true'}]}
> >> > >> > >> >
> >> > >> > >> > Thanks,
> >> > >> > >> > Ramu
> >> > >> > >> >
> >> > >> > >> >
> >> > >> > >> > On Mon, Oct 7, 2013 at 2:42 PM, Ramu M S <
> >> ramu.malur@gmail.com>
> >> > >> > wrote:
> >> > >> > >> >
> >> > >> > >> >> Lars,
> >> > >> > >> >>
> >> > >> > >> >> - Yes Short Circuit reading is enabled
on both HDFS and
> >> HBase.
> >> > >> > >> >> - I had issued Major compaction after table
is loaded.
> >> > >> > >> >> - Region Servers have max heap set as 128
GB. Block Cache
> >> Size
> >> > is
> >> > >> > 0.25
> >> > >> > >> of
> >> > >> > >> >> heap (So 32 GB for each Region Server)
Do we need even
> more?
> >> > >> > >> >> - Decreasing HFile Size (Default is 1GB
)? Should I leave
> it
> >> to
> >> > >> > >> default?
> >> > >> > >> >> - Keys are Zipfian distributed (By YCSB)
> >> > >> > >> >>
> >> > >> > >> >> Bharath,
> >> > >> > >> >>
> >> > >> > >> >> Bloom Filters are enabled. Here is my table
details,
> >> > >> > >> >> {NAME => 'usertable', FAMILIES =>
[{NAME => 'cf',
> >> > >> DATA_BLOCK_ENCODING
> >> > >> > >> =>
> >> > >> > >> >> 'NONE', BLOOMFILTER => 'ROWCOL', REPLICATION_SCOPE
=> '0',
> >> > >> VERSIONS
> >> > >> > =>
> >> > >> > >> '1',
> >> > >> > >> >> COMPRESSION => 'NONE', MIN_VERSIONS
=> '0', TTL =>
> >> '2147483647
> >> > ',
> >> > >> > >> >> KEEP_DELETED_CELLS => 'false', BLOCKSIZE
=> '16384',
> >> IN_MEMORY
> >> > =>
> >> > >> > >> 'false',
> >> > >> > >> >> ENCODE_ON_DISK => 'true', BLOCKCACHE
=> 'true'}]}
> >> > >> > >> >>
> >> > >> > >> >> When the data size is around 100GB (100
Million records),
> >> then
> >> > the
> >> > >> > >> >> latency is very good. I am getting a throughput
of around
> >> 300K
> >> > >> OPS.
> >> > >> > >> >> In both cases (100 GB and 1.8 TB) Ganglia
stats show that
> >> Disk
> >> > >> reads
> >> > >> > >> are
> >> > >> > >> >> around 50-60 MB/s throughout the read cycle.
> >> > >> > >> >>
> >> > >> > >> >> Thanks,
> >> > >> > >> >> Ramu
> >> > >> > >> >>
> >> > >> > >> >>
> >> > >> > >> >> On Mon, Oct 7, 2013 at 2:21 PM, lars hofhansl
<
> >> larsh@apache.org
> >> > >
> >> > >> > >> wrote:
> >> > >> > >> >>
> >> > >> > >> >>> Have you enabled short circuit reading?
See here:
> >> > >> > >> >>> http://hbase.apache.org/book/perf.hdfs.html
> >> > >> > >> >>>
> >> > >> > >> >>> How's your data locality (shown on
the RegionServer UI
> >> page).
> >> > >> > >> >>>
> >> > >> > >> >>>
> >> > >> > >> >>> How much memory are you giving your
RegionServers?
> >> > >> > >> >>> If you reads are truly random and the
data set does not
> fit
> >> > into
> >> > >> the
> >> > >> > >> >>> aggregate cache, you'll be dominated
by the disk and
> >> network.
> >> > >> > >> >>> Each read would need to bring in a
64k (default) HFile
> >> block.
> >> > If
> >> > >> > short
> >> > >> > >> >>> circuit reading is not enabled you'll
get two or three
> >> context
> >> > >> > >> switches.
> >> > >> > >> >>>
> >> > >> > >> >>> So I would try:
> >> > >> > >> >>> 1. Enable short circuit reading
> >> > >> > >> >>> 2. Increase the block cache size per
RegionServer
> >> > >> > >> >>> 3. Decrease the HFile block size
> >> > >> > >> >>> 4. Make sure your data is local (if
it is not, issue a
> major
> >> > >> > >> compaction).
> >> > >> > >> >>>
> >> > >> > >> >>>
> >> > >> > >> >>> -- Lars
> >> > >> > >> >>>
> >> > >> > >> >>>
> >> > >> > >> >>>
> >> > >> > >> >>> ________________________________
> >> > >> > >> >>>  From: Ramu M S <ramu.malur@gmail.com>
> >> > >> > >> >>> To: user@hbase.apache.org
> >> > >> > >> >>> Sent: Sunday, October 6, 2013 10:01
PM
> >> > >> > >> >>> Subject: HBase Random Read latency
> 100ms
> >> > >> > >> >>>
> >> > >> > >> >>>
> >> > >> > >> >>> Hi All,
> >> > >> > >> >>>
> >> > >> > >> >>> My HBase cluster has 8 Region Servers
(CDH 4.4.0, HBase
> >> > 0.94.6).
> >> > >> > >> >>>
> >> > >> > >> >>> Each Region Server is with the following
configuration,
> >> > >> > >> >>> 16 Core CPU, 192 GB RAM, 800 GB SATA
(7200 RPM) Disk
> >> > >> > >> >>> (Unfortunately configured with RAID
1, can't change this
> as
> >> the
> >> > >> > >> Machines
> >> > >> > >> >>> are leased temporarily for a month).
> >> > >> > >> >>>
> >> > >> > >> >>> I am running YCSB benchmark tests on
HBase and currently
> >> > >> inserting
> >> > >> > >> around
> >> > >> > >> >>> 1.8 Billion records.
> >> > >> > >> >>> (1 Key + 7 Fields of 100 Bytes = 724
Bytes per record)
> >> > >> > >> >>>
> >> > >> > >> >>> Currently I am getting a write throughput
of around 100K
> >> OPS,
> >> > but
> >> > >> > >> random
> >> > >> > >> >>> reads are very very slow, all gets
have more than 100ms or
> >> more
> >> > >> > >> latency.
> >> > >> > >> >>>
> >> > >> > >> >>> I have changed the following default
configuration,
> >> > >> > >> >>> 1. HFile Size: 16GB
> >> > >> > >> >>> 2. HDFS Block Size: 512 MB
> >> > >> > >> >>>
> >> > >> > >> >>> Total Data size is around 1.8 TB (Excluding
the replicas).
> >> > >> > >> >>> My Table is split into 128 Regions
(No pre-splitting used,
> >> > >> started
> >> > >> > >> with 1
> >> > >> > >> >>> and grew to 128 over the insertion
time)
> >> > >> > >> >>>
> >> > >> > >> >>> Taking some inputs from earlier discussions
I have done
> the
> >> > >> > following
> >> > >> > >> >>> changes to disable Nagle (In both Client
and Server
> >> > >> hbase-site.xml,
> >> > >> > >> >>> hdfs-site.xml)
> >> > >> > >> >>>
> >> > >> > >> >>> <property>
> >> > >> > >> >>>   <name>hbase.ipc.client.tcpnodelay</name>
> >> > >> > >> >>>   <value>true</value>
> >> > >> > >> >>> </property>
> >> > >> > >> >>>
> >> > >> > >> >>> <property>
> >> > >> > >> >>>   <name>ipc.server.tcpnodelay</name>
> >> > >> > >> >>>   <value>true</value>
> >> > >> > >> >>> </property>
> >> > >> > >> >>>
> >> > >> > >> >>> Ganglia stats shows large CPU IO wait
(>30% during reads).
> >> > >> > >> >>>
> >> > >> > >> >>> I agree that disk configuration is
not ideal for Hadoop
> >> > cluster,
> >> > >> but
> >> > >> > >> as
> >> > >> > >> >>> told earlier it can't change for now.
> >> > >> > >> >>> I feel the latency is way beyond any
reported results so
> >> far.
> >> > >> > >> >>>
> >> > >> > >> >>> Any pointers on what can be wrong?
> >> > >> > >> >>>
> >> > >> > >> >>> Thanks,
> >> > >> > >> >>> Ramu
> >> > >> > >> >>>
> >> > >> > >> >>
> >> > >> > >> >>
> >> > >> > >> >
> >> > >> > >>
> >> > >> > >
> >> > >> > >
> >> > >> >
> >> > >>
> >> > >>
> >> > >>
> >> > >> --
> >> > >> Bharath Vissapragada
> >> > >> <http://www.cloudera.com>
> >> > >>
> >> > >
> >> > >
> >> >
> >>
> >
> >
>
> Confidentiality Notice:  The information contained in this message,
> including any attachments hereto, may be confidential and is intended to be
> read only by the individual or entity to whom this message is addressed. If
> the reader of this message is not the intended recipient or an agent or
> designee of the intended recipient, please note that any review, use,
> disclosure or distribution of this message or its attachments, in any form,
> is strictly prohibited.  If you have received this message in error, please
> immediately notify the sender and/or Notifications@carrieriq.com and
> delete or destroy any copy of this message and its attachments.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message