Mailing-List: contact user-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hbase.apache.org
MIME-Version: 1.0
In-Reply-To: <239C5132291A8942B4FD0836BD66EFE3918B34D6@ORD2MBX05C.mex05.mlsrvr.com>
References: <239C5132291A8942B4FD0836BD66EFE3918B3142@ORD2MBX05C.mex05.mlsrvr.com>
 <5894F096.707@apache.org> <239C5132291A8942B4FD0836BD66EFE3918B34D6@ORD2MBX05C.mex05.mlsrvr.com>
From: Ted Yu <yuzhihong@gmail.com>
Date: Fri, 3 Feb 2017 13:46:52 -0800
Message-ID: <CALte62xHvvq8gDXHXfZ14-G7KmdaCEdRNerW8X+1HaABNAL7aA@mail.gmail.com>
Subject: Re: Hbase Architecture Questions
To: "user@hbase.apache.org" <user@hbase.apache.org>
Content-Type: multipart/alternative; boundary=94eb2c0ef4d847ee120547a738ef
archived-at: Fri, 03 Feb 2017 21:47:06 -0000

--94eb2c0ef4d847ee120547a738ef
Content-Type: text/plain; charset=UTF-8

bq. We use Hbase 1.0.0

1.0.0 was quite old.

Can you try more recent releases such as 1.3.0 (the hbase-thrift module
should be more robust) ?

If your nodes have enough memory, have you thought of using bucket cache to
improve read performance ?

Cheers

On Fri, Feb 3, 2017 at 1:34 PM, Akshat Mahajan <amahajan@brightedge.com>
wrote:

> > By "After Hadoop runs", you mean a batch collection/processing job?
> MapReduce? Hadoop is a collection of distributed processing tools:
> Filesystem (HDFS) and distributed execution (YARN)...
>
> Yes, we mean a batch collection/processing job. We use Yarn and HDFS, and
> we employ the Hadoop API to run only mappers (we do not need to perform
> reductions on our data) in Java. But the collection happens through Python,
> necessitating the use of Thrift, and the actual processing happens through
> Yarn.
>
> > If you are deleting the table in a short period of time, then, yes,
> disabling major compactions is probably "OK". However, not running major
> compactions will have read performance implications. While there are
> tombstones, these represent extra work that your regionservers must perform.
>
> Can you please clarify? Our understanding is that a table deletion will
> close the entire region corresponding to that table across all
> RegionServers. If that is correct, why should there be a read performance
> issue? (I'm assuming that closed regions are never accessed again by the
> regionservers - am I correct?).
>
> > It sounds like you could just adopt a bulk-loading approach instead of
> doing live writes into HBase..
>
> This is certainly a possibility, but would require a fairly sizable
> rewrite for our application.
>
> >  I doubt REST is ever going to be a "well-performing" tool. You're
> likely chasing your tail to try to optimize this. Your time would be better
> spent using the HBase Java API directly.
>
> We are constrained by having to use Python, so we can't use the native
> API. We switched from Thrift to REST when we found Thrift kept dying under
> the load we put it under.
>
> > Are you taxing your machines' physical resources? What does I/O, CPU and
> memory utilization look like?
>
> Let me get back to you on this front more fully in a followup.
>
> Currently providing estimates for our bulkier and more problematic cluster:
>
> We are not constrained by memory - our free memory utilisation on all the
> regionservers is close to 99%, but about 40% of that in each RS is used in
> caches that will be readily given to any programs that require it by the
> kernel (as assessed by `free`). On the master node, it is closer to 60%
> memory utilisation.
>
> Our CPU utilisation varies, but under regular operation, user time is at
> 60 - 40 percent on all regionservers. CPU idle time is very high, usually,
> about 90%, and CPU system time is about 5%.
>
> I/O wait time is very, very low - about 0.23% on average.
>
> > Yeah I'm not sure why you are doing this. There's no reason I can think
> of as to why you would need to do this...
>
> Believe it or not, it is the only thing we have found that helps us
> restore performance temporarily.
>
> It's not ideal, and we don't want to keep doing it, though.
>
> Akshat
>
>
> -----Original Message-----
> From: Josh Elser [mailto:elserj@apache.org]
> Sent: Friday, February 03, 2017 1:05 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase Architecture Questions
>
>
>
> Akshat Mahajan wrote:
>
> > b) Every table in our Hbase is associated with a unique collection of
> items, and all tables exhibit the same column families (2 column families).
> After Hadoop runs, we no longer require the tables, so we delete them.
> Individual rows are never removed; instead, entire tables are removed at a
> time.
>
> By "After Hadoop runs", you mean a batch collection/processing job?
> MapReduce? Hadoop is a collection of distributed processing tools:
> Filesystem (HDFS) and distributed execution (YARN)...
>
> > a) _We have turned major compaction off_.
> >
> > We are aware this is against recommended advice. Our reasoning for
> > this is that
> >
> > 1) periods of major compaction degrade both our read and write
> > performance heavily (to the point our schedule is delayed beyond
> > tolerance), and
> > 2) all our tables are temporary - we do not intend to keep them around,
> and disabling/deleting old tables closes entire regions altogether and
> should have the same effect as major compaction processing tombstone
> markers on rows. Read performance should then theoretically not be impacted
> - we expect that the RegionServer will never even consult that region in
> doing reads, so storefile buildup overall should not be an issue.
>
> If you are deleting the table in a short period of time, then, yes,
> disabling major compactions is probably "OK".
>
> However, not running major compactions will have read performance
> implications. While there are tombstones, these represent extra work that
> your regionservers must perform.
>
> >
> > b) _We have made additional efforts to turn off minor compactions as
> much as possible_.
> >
> > In particular, our hbase.hstore.compaction.max.size is set to 500 MB,
> our hbase.hstore.compactionThreshold is set to 1000 bytes. We do this in
> order to prevent a minor compaction from becoming a major compaction -
> since we cannot prevent that, we were forced to try and prevent minor
> compactions running at all.
>
> It sounds like you could just adopt a bulk-loading approach instead of
> doing live writes into HBase..
>
> > c)  We have tried to make REST more performant by improving the number
> of REST threads to about 9000.
> >
> > This figure is derived from counting the number of connections on REST
> during periods of high write load.
>
> I doubt REST is ever going to be a "well-performing" tool. You're likely
> chasing your tail to try to optimize this. Your time would be better
> spent using the HBase Java API directly.
>
> > d) We have turned on bloom filters, use an LRUBlockCache which caches
> data only on reads, and have set tcpnodelay to true. These were in place
> before we turned major compaction off.
> >
> > Our observations with these settings in performance:
> >
> > a) We are seeing an impact on both read/write performance correlated
> strongly with store file buildup. Our store files number between 500 to
> 1500 on each RS - the total size on each RegionServer are on the order 100
> to 200 GBs at worst.
> > b) As number of connections on Hbase REST rises, write performance is
> impacted. We originally believed this was due to high frequency of memstore
> flushes - but increasing the memstore buffer sizes has had no discernible
> impact on read/write. Currently, our callqueue.handler.size is set to 100 -
> since we experience over 100 requests/second on each RS, we are considering
> increasing this to about 300 so we can handle more requests concurrently.
> Is this a good change, or are other changes needed as well?
>
> Are you taxing your machines' physical resources? What does I/O, CPU and
> memory utilization look like?
>
> > Unfortunately, we cannot provide raw metrics on the magnitude of
> read/write performance degradation as we do not have sufficient tracking
> for them. A rough proxy - we do know our clusters are capable of processing
> 200 jobs in an hour. This now goes down to as low as 30-50 jobs per hour
> with minimal changes to the jobs themselves. We wish to be able to get back
> to our original performance.
> >
> > For now, in periods of high stress (large jobs or quick reads/writes),
> we are manually clearing out the hbase folder in HDFS (including store
> files, WALs, oldWALs and archived files), and resetting our clusters to an
> empty state. We are aware this is not ideal, and are looking for ways to
> not have to do this. Our understanding of how Hbase works is probably
> imperfect, and we would appreciate any advice or feedback in this regard.
>
> Yeah I'm not sure why you are doing this. There's no reason I can think
> of as to why you would need to do this...
>

--94eb2c0ef4d847ee120547a738ef--