hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "LN (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-745) scaling of one regionserver, improving memory and cpu usage
Date Tue, 15 Jul 2008 08:35:32 GMT

    [ https://issues.apache.org/jira/browse/HBASE-745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12613555#action_12613555

LN commented on HBASE-745:

compaction time caculating:
1. suppose we are keep writing data to regionserver, and rowid of data is hashed to all regions.
2. according to default optionalcacheflushinterval(30min) and threshold(3), all HStore will
have a memcache flushed storefile in 30min, after 1 hour, each HStore will have 3 storefile(include
original 1), so a compaction will taken. that is, all HStore in the regionserver will do a
compaction in 1 hour.
3. a compaction of HStore will read all data in mapfiles of the HStore, i'd suppose the time
of compcating is depends on total file size of mapfiles. so the whole compacting time(caused
by optionalcacheflushinterval) of a regionserver, depends on data size  the regionserver serving.
4. now we can see, the default optionalcacheflushinterval is not suitable for most env., i've
found my hardware(Xeon 3.2*2, dualcore, scsi ) can compacting 10M data per second, this mean
it can compact 36G in 1 hour, when data size larger than 36G?...
5. how about increasing optionalcacheflushinterval? to 12hours, even 24hours? unfortunatly,
i found it useless. because globalMemcacheLimit, it default 512M, when it reached, memcache
will flushed(storefile created), until total size of memcache lower than 256M, since inserted
rowids are distributed to all regions, nearly half of all regions will have a new storefile
too. then when inserted data reach 1G(4 times of flushing global memcache), all data of the
regionserver compacted. no setting can adjust this behavor.

> scaling of one regionserver, improving memory and cpu usage
> -----------------------------------------------------------
>                 Key: HBASE-745
>                 URL: https://issues.apache.org/jira/browse/HBASE-745
>             Project: Hadoop HBase
>          Issue Type: Improvement
>          Components: regionserver
>    Affects Versions: 0.1.3
>         Environment: hadoop 0.17.1
>            Reporter: LN
>            Priority: Minor
> after weeks testing hbase 0.1.3 and hadoop(0.16.4, 0.17.1), i found there are many works
to do,  before a particular regionserver can handle data about 100G, or even more. i'd share
my opions here with stack, and other developers.
> first, the easiest way improving scalability of regionserver is upgrading hardware, use
64bit os and 8G memory for the regionserver process, and speed up disk io. 
> besides hardware, following are software bottlenecks i found in regionserver:
> 1. as data increasing, compaction was eating cpu(with io) times, the total compaction
time is basicly linear relative to whole data size, even worse, sometimes square relavtive
to that size.
> 2. memory and socket connection usage are depends on opened mapfiles, see HADOOP-2341
and HBASE-24. 
> will explain above in comments later.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message