hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Segel <michael_se...@hotmail.com>
Subject Re: Low CPU usage and slow reads in pseudo-distributed mode - how to fix?
Date Sun, 11 Jan 2015 14:49:01 GMT
Pseudo cluster on a machine that has 4GB of memory. 
If you give HBase 1.5GB for the region server… you are left with 2.5 GB of memory for everything
You will swap. 

In short, nothing he can do will help. He’s screwed if he is trying to look improving performance.

On Jan 11, 2015, at 12:19 AM, Ted Yu <yuzhihong@gmail.com> wrote:

> Please see http://hbase.apache.org/book.html#perf.reading
> I guess you use 0.90.4 because of Nutch integration. Still 0.90.x was way
> too old.
> bq. HBase has a heapsize of 1.5 Gigs
> This is not enough memory for good read performance. Please consider giving
> HBase more heap.
> Cheers
> On Sat, Jan 10, 2015 at 4:04 PM, Dave Benson <davehbenson@gmail.com> wrote:
>> Hi HBase users,
>> I'm working HBase for the first time and I'm trying to sort out a
>> performance issue. HBase is the data store for a small, focused web crawl
>> I'm performing with Apache Nutch. I'm running in pseudo-distributed mode,
>> meaning that Nutch, HBase and Hadoop are all on the same machine. The
>> machine's a few years old and has only 4 gigs of RAM - much smaller than
>> most HBase installs, I know.
>> When I first start my HBase processes I get about 60 seconds of fast
>> performance. Hbase reads quickly and uses a healthy portion CPU cycles.
>> After a minute or so, though, HBase slows dramatically. Reads sink to a
>> glacial pace, and the CPU sits mostly idle.
>> I notice this pattern when I run Nutch - particularly during read-heavy
>> operations - but also when I run a simple row counter from the shell.
>> At the moment " count 'my_table' " takes almost 4 hours to read through 500
>> 000 rows. The reading is much faster at the start than the end.  In the
>> first 30 seconds, HBase counts 37000 rows, but in the 30 seconds between
>> 8:00 and 8:30, only 1000 are counted.
>> Looking through my Ganglia report I see a brief return to high performance
>> around 3 hours into the count. I don't know what's causing this spike.
>> Can anyone suggest what configuration parameters I should change to improve
>> read performance?  Or what reference materials I should consult to better
>> understand the problem?  Again, I'm totally new to HBase.
>> I'm using HBase 0.90.4 and Hadoop 1.2.2. HBase has a heapsize of 1.5 Gigs.
>> Here's a Ganglia report covering the 4 hours of " count 'my_table' ":
>> http://imgur.com/Aa3eukZ
>> Please let me know if I can provide any more information.
>> Many thanks,
>> Dave

View raw message