hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-745) scaling of one regionserver, improving memory and cpu usage
Date Tue, 15 Jul 2008 14:55:31 GMT

    [ https://issues.apache.org/jira/browse/HBASE-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613627#action_12613627

Billy Pearson commented on HBASE-745:

I agree on your idea of a incremental compaction

My two ideas for increased efficiency in compaction while under load

1. compact only the newest threshold(3) of mapfiles

This will allow a region server to compact the lastest 3 map files created lowering the number
of mapfile by 2 per compaction
the newest mapfile will not store the bulk of the data for a region if we are under load they
will be small memcache flushes and compact fast.

By doing the newest ones when the load reduces and there is only 3 map files left 1 will be
the largest and oldest mapfile 
and all old data and new data will get compacted together.

2. The compaction queue
Currently we only add the region to a queued list of regions needing compaction check and
compact in that order.

My suggestion would be to have the queued list store how many times a region has been added
to the compaction queued(memcache flushes)
That way we can sort the list and compact the hot spots under load and compact them first
and reduce the number of map files the fastest with the above idea implemented.
When it is done with the compaction reduce the number in the queued by how many files we compacted
or remove it if  left to compact and sort the list again start over.

these are my ideas on how we can reduce the number of mapfiles we have while we are under
a write load.

> scaling of one regionserver, improving memory and cpu usage
> -----------------------------------------------------------
>                 Key: HBASE-745
>                 URL: https://issues.apache.org/jira/browse/HBASE-745
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.1.3
>         Environment: hadoop 0.17.1
>            Reporter: LN
>            Priority: Minor
> after weeks testing hbase 0.1.3 and hadoop(0.16.4, 0.17.1), i found there are many works
to do,  before a particular regionserver can handle data about 100G, or even more. i'd share
my opions here with stack, and other developers.
> first, the easiest way improving scalability of regionserver is upgrading hardware, use
64bit os and 8G memory for the regionserver process, and speed up disk io. 
> besides hardware, following are software bottlenecks i found in regionserver:
> 1. as data increasing, compaction was eating cpu(with io) times, the total compaction
time is basicly linear relative to whole data size, even worse, sometimes square relavtive
to that size.
> 2. memory usage are depends on opened mapfiles
> 3. network connection are depends on opened mapfiles, see HADOOP-2341 and HBASE-24. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message