hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1644) [hbase] Compactions should not block updates
Date Thu, 09 Aug 2007 23:11:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12518867
] 

stack commented on HADOOP-1644:
-------------------------------

Let me try your suggestion Jim of not having compactions disable flushes. 

Another thing I'd like to try is that rather than flushing memory to a new file, instead flush
by merging with an existant file.  I'm thinking it will take the same amount of elapsed time
but we'll have put off a full compaction by not producing an added file.

Another element to consider is that compactions are the means by which HStoreFile references
are cleaned up in a region (If references, then a region cannot be split) so compaction should
be doing its best to clean up instances of reference files.



> [hbase] Compactions should not block updates
> --------------------------------------------
>
>                 Key: HADOOP-1644
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1644
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: contrib/hbase
>    Affects Versions: 0.15.0
>            Reporter: stack
>            Assignee: stack
>             Fix For: 0.15.0
>
>
> Currently, compactions take a long time.  During compaction, updates are carried by the
HRegions' memcache (+ backing HLog).  memcache is unable to flush to disk until compaction
completes.
> Under sustained, substantial --  rows that contain multiple columns one of which is a
web page -- updates by multiple concurrent clients (10 in this case), a common hbase usage
scenario, the memcache grows fast and often to orders of magnitude in excess of the configured
'flush-to-disk' threshold.
> This throws the whole system out of kilter.  When memcache does get to run after compaction
completes -- assuming you have sufficent RAM and the region server doesn't OOME -- then the
resulting on-disk file will be way larger than any other on-disk HStoreFile bringing on a
region split ..... but the resulting split will produce regions that themselves need to be
immediately split because each half is beyond the configured limit, and so on...
> In another issue yet to be posted, tuning and some pointed memcache flushes makes the
above condition less extreme but until compaction durations come close to the memcache flush
threshold compactions will remain disruptive. 
> Its allowed that compactions may never be fast enough as per bigtable paper (This is
a 'wish' issue).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message