hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akshat Mahajan <amaha...@brightedge.com>
Subject RE: Hbase Architecture Questions
Date Fri, 03 Feb 2017 21:34:50 GMT
> By "After Hadoop runs", you mean a batch collection/processing job? MapReduce? Hadoop
is a collection of distributed processing tools: Filesystem (HDFS) and distributed execution
(YARN)...

Yes, we mean a batch collection/processing job. We use Yarn and HDFS, and we employ the Hadoop
API to run only mappers (we do not need to perform reductions on our data) in Java. But the
collection happens through Python, necessitating the use of Thrift, and the actual processing
happens through Yarn.

> If you are deleting the table in a short period of time, then, yes, disabling major compactions
is probably "OK". However, not running major compactions will have read performance implications.
While there are tombstones, these represent extra work that your regionservers must perform.

Can you please clarify? Our understanding is that a table deletion will close the entire region
corresponding to that table across all RegionServers. If that is correct, why should there
be a read performance issue? (I'm assuming that closed regions are never accessed again by
the regionservers - am I correct?).

> It sounds like you could just adopt a bulk-loading approach instead of doing live writes
into HBase..

This is certainly a possibility, but would require a fairly sizable rewrite for our application.


>  I doubt REST is ever going to be a "well-performing" tool. You're likely chasing your
tail to try to optimize this. Your time would be better spent using the HBase Java API directly.

We are constrained by having to use Python, so we can't use the native API. We switched from
Thrift to REST when we found Thrift kept dying under the load we put it under. 

> Are you taxing your machines' physical resources? What does I/O, CPU and memory utilization
look like?

Let me get back to you on this front more fully in a followup. 

Currently providing estimates for our bulkier and more problematic cluster:

We are not constrained by memory - our free memory utilisation on all the regionservers is
close to 99%, but about 40% of that in each RS is used in caches that will be readily given
to any programs that require it by the kernel (as assessed by `free`). On the master node,
it is closer to 60% memory utilisation. 

Our CPU utilisation varies, but under regular operation, user time is at 60 - 40 percent on
all regionservers. CPU idle time is very high, usually, about 90%, and CPU system time is
about 5%.

I/O wait time is very, very low - about 0.23% on average.

> Yeah I'm not sure why you are doing this. There's no reason I can think of as to why
you would need to do this...

Believe it or not, it is the only thing we have found that helps us restore performance temporarily.


It's not ideal, and we don't want to keep doing it, though.

Akshat


-----Original Message-----
From: Josh Elser [mailto:elserj@apache.org] 
Sent: Friday, February 03, 2017 1:05 PM
To: user@hbase.apache.org
Subject: Re: Hbase Architecture Questions



Akshat Mahajan wrote:

> b) Every table in our Hbase is associated with a unique collection of items, and all
tables exhibit the same column families (2 column families). After Hadoop runs, we no longer
require the tables, so we delete them. Individual rows are never removed; instead, entire
tables are removed at a time.

By "After Hadoop runs", you mean a batch collection/processing job? 
MapReduce? Hadoop is a collection of distributed processing tools: 
Filesystem (HDFS) and distributed execution (YARN)...

> a) _We have turned major compaction off_.
>
> We are aware this is against recommended advice. Our reasoning for 
> this is that
>
> 1) periods of major compaction degrade both our read and write 
> performance heavily (to the point our schedule is delayed beyond 
> tolerance), and
> 2) all our tables are temporary - we do not intend to keep them around, and disabling/deleting
old tables closes entire regions altogether and should have the same effect as major compaction
processing tombstone markers on rows. Read performance should then theoretically not be impacted
- we expect that the RegionServer will never even consult that region in doing reads, so storefile
buildup overall should not be an issue.

If you are deleting the table in a short period of time, then, yes, disabling major compactions
is probably "OK".

However, not running major compactions will have read performance implications. While there
are tombstones, these represent extra work that your regionservers must perform.

>
> b) _We have made additional efforts to turn off minor compactions as much as possible_.
>
> In particular, our hbase.hstore.compaction.max.size is set to 500 MB, our hbase.hstore.compactionThreshold
is set to 1000 bytes. We do this in order to prevent a minor compaction from becoming a major
compaction - since we cannot prevent that, we were forced to try and prevent minor compactions
running at all.

It sounds like you could just adopt a bulk-loading approach instead of 
doing live writes into HBase..

> c)  We have tried to make REST more performant by improving the number of REST threads
to about 9000.
>
> This figure is derived from counting the number of connections on REST during periods
of high write load.

I doubt REST is ever going to be a "well-performing" tool. You're likely 
chasing your tail to try to optimize this. Your time would be better 
spent using the HBase Java API directly.

> d) We have turned on bloom filters, use an LRUBlockCache which caches data only on reads,
and have set tcpnodelay to true. These were in place before we turned major compaction off.
>
> Our observations with these settings in performance:
>
> a) We are seeing an impact on both read/write performance correlated strongly with store
file buildup. Our store files number between 500 to 1500 on each RS - the total size on each
RegionServer are on the order 100 to 200 GBs at worst.
> b) As number of connections on Hbase REST rises, write performance is impacted. We originally
believed this was due to high frequency of memstore flushes - but increasing the memstore
buffer sizes has had no discernible impact on read/write. Currently, our callqueue.handler.size
is set to 100 - since we experience over 100 requests/second on each RS, we are considering
increasing this to about 300 so we can handle more requests concurrently. Is this a good change,
or are other changes needed as well?

Are you taxing your machines' physical resources? What does I/O, CPU and 
memory utilization look like?

> Unfortunately, we cannot provide raw metrics on the magnitude of read/write performance
degradation as we do not have sufficient tracking for them. A rough proxy - we do know our
clusters are capable of processing 200 jobs in an hour. This now goes down to as low as 30-50
jobs per hour with minimal changes to the jobs themselves. We wish to be able to get back
to our original performance.
>
> For now, in periods of high stress (large jobs or quick reads/writes), we are manually
clearing out the hbase folder in HDFS (including store files, WALs, oldWALs and archived files),
and resetting our clusters to an empty state. We are aware this is not ideal, and are looking
for ways to not have to do this. Our understanding of how Hbase works is probably imperfect,
and we would appreciate any advice or feedback in this regard.

Yeah I'm not sure why you are doing this. There's no reason I can think 
of as to why you would need to do this...

Mime
View raw message