hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akhtar Muhammad Din <akhtar.m...@gmail.com>
Subject Re: Hbase Performance Issue
Date Sat, 04 Jan 2014 21:34:48 GMT
Thanks guys for your precious time.
Vladimir, as Ted rightly said i want to improve write performance currently
(of course i want to read data as fast as possible later on)
Kevin, my current understanding of bulk load is that you generate
StoreFiles and later load through a command line program. I dont want to do
any manual step. Our system is getting data after every 15 minutes, so
requirement is to automate it through client API completely.



On Sun, Jan 5, 2014 at 2:19 AM, Kevin O'dell <kevin.odell@cloudera.com>wrote:

> Have you tried writing out an hfile and then bulk loading the data?
> On Jan 4, 2014 4:01 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
>
> > bq. Output is written to either Hbase
> >
> > Looks like Akhtar wants to boost write performance to HBase.
> > MapReduce over snapshot files targets higher read throughput.
> >
> > Cheers
> >
> >
> > On Sat, Jan 4, 2014 at 12:55 PM, Vladimir Rodionov
> > <vrodionov@carrieriq.com>wrote:
> >
> > > You cay try MapReduce over snapshot files
> > > https://issues.apache.org/jira/browse/HBASE-8369
> > >
> > > but you will need to patch 0.94.
> > >
> > > Best regards,
> > > Vladimir Rodionov
> > > Principal Platform Engineer
> > > Carrier IQ, www.carrieriq.com
> > > e-mail: vrodionov@carrieriq.com
> > >
> > > ________________________________________
> > > From: Akhtar Muhammad Din [akhtar.mdin@gmail.com]
> > > Sent: Saturday, January 04, 2014 12:44 PM
> > > To: user@hbase.apache.org
> > > Subject: Re: Hbase Performance Issue
> > >
> > > im  using CDH 4.5:
> > > Hadoop:  2.0.0-cdh4.5.0
> > > HBase:   0.94.6-cdh4.5.0
> > >
> > > Regards
> > >
> > >
> > > On Sun, Jan 5, 2014 at 1:24 AM, Ted Yu <yuzhihong@gmail.com> wrote:
> > >
> > > > What version of HBase / hdfs are you running with ?
> > > >
> > > > Cheers
> > > >
> > > >
> > > >
> > > > On Sat, Jan 4, 2014 at 12:17 PM, Akhtar Muhammad Din
> > > > <akhtar.mdin@gmail.com>wrote:
> > > >
> > > > > Hi,
> > > > > I have been running a map reduce job that joins 2 datasets of 1.3
> > and 4
> > > > GB
> > > > > in size. Joining is done at reduce side. Output is written to
> either
> > > > Hbase
> > > > > or HDFS depending upon configuration. The problem I am having is
> that
> > > > Hbase
> > > > > takes about 60-80 minutes to write the processed data, on the other
> > > hand
> > > > > HDFS takes only 3-5 mins to write the same data. I really want to
> > > improve
> > > > > the Hbase speed and bring it down to 1-2 min.
> > > > >
> > > > > I am using amazon EC2 instances, launched a cluster of size 3 and
> > later
> > > > 10,
> > > > > have tried both c3.4xlarge and c3.8xlarge instances.
> > > > >
> > > > > I can see significant increase in performance while writing to HDFS
> > as
> > > i
> > > > > use cluster with more nodes, having high specifications, but in the
> > > case
> > > > of
> > > > > Hbase there was no significant change in performance.
> > > > >
> > > > > I have been going through different posts, articles and have read
> > Hbase
> > > > > book to solve the Hbase performance issue but have not been able
to
> > > > succeed
> > > > > so far.
> > > > > Here are the few things i have tried out so far:
> > > > >
> > > > > *Client Side*
> > > > > - Turned off writing to WAL
> > > > > - Experimented with write buffer size
> > > > > - Turned off auto flush on table
> > > > > - Used cache, experimented with different sizes
> > > > >
> > > > >
> > > > > *Hbase Server Side*
> > > > > - Increased region servers heap size to 8 GB
> > > > > - Experimented with handlers count
> > > > > - Increased Memstore flush size to 512 MB
> > > > > - Experimented with hbase.hregion.max.filesize, tried different
> sizes
> > > > >
> > > > > There are many other parameters i have tried out following the
> > > > suggestions
> > > > > from  different sources, but nothing worked so far.
> > > > >
> > > > > Your help will be really appreciated.
> > > > >
> > > > > --
> > > > > Regards
> > > > > Akhtar Muhammad Din
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Regards
> > > Akhtar Muhammad Din
> > >
> > > Confidentiality Notice:  The information contained in this message,
> > > including any attachments hereto, may be confidential and is intended
> to
> > be
> > > read only by the individual or entity to whom this message is
> addressed.
> > If
> > > the reader of this message is not the intended recipient or an agent or
> > > designee of the intended recipient, please note that any review, use,
> > > disclosure or distribution of this message or its attachments, in any
> > form,
> > > is strictly prohibited.  If you have received this message in error,
> > please
> > > immediately notify the sender and/or Notifications@carrieriq.com and
> > > delete or destroy any copy of this message and its attachments.
> > >
> >
>



-- 
Regards
Akhtar Muhammad Din

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message