hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-2615) Add max number of mapfiles to compact at one time giveing us a minor & major compaction
Date Fri, 25 Jan 2008 22:02:35 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-2615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12562709#action_12562709
] 

stack commented on HADOOP-2615:
-------------------------------

I made issue HADOOP-2712 to cover not-splitting under load.

Billy, in bigtable paper, I believe what we call a flush is a minor compaction in gwhogle-speak
and a merging compaction is what they call compaction of a few store files interleaving whats
in memcache.

.bq When doing a minor compaction on a few files I thank we should compact the newest mapfiles
first leave the larger/older ones for when we have low updates to a region.

Why you think newer rather than older Billy?

.bq I still thank if we are going to have transactions speed close to bigtables we will need
to add a limit on number of map files to compaction at one time.

I agree given the times to compact posted above.

By the way, I tried out my simple upper-bound patch that put a cap of 2*compactionThreshold
on number of files to compact at once.  Seems to work with messages like below showing from
time to time:

{code}2008-01-25 20:44:38,330 DEBUG org.apache.hadoop.hbase.HStore: Count of files to compact
in 2052803679/info is 8 which is > twice compaction threshold of 3. Compacting 6 only
{code}

FYI, regionserver runs compaction.  Master has no say at moment.





> Add max number of mapfiles to compact at one time giveing us a minor & major compaction
> ---------------------------------------------------------------------------------------
>
>                 Key: HADOOP-2615
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2615
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/hbase
>            Reporter: Billy Pearson
>            Priority: Minor
>             Fix For: 0.17.0
>
>         Attachments: flag-v2.patch, flag.patch, twice.patch
>
>
> Currently we do compaction on a region when the hbase.hstore.compactionThreshold is reached
- default 3
> I thank we should configure a max number of mapfiles to compact at one time simulator
to doing a minor compaction in bigtable. This keep compaction's form getting tied up in one
region to long letting other regions get way to many memcache flushes making compaction take
longer and longer for each region
> If we did that when a regions updates start to slack off the max number will eventuly
include all mapfiles causeing a major compaction on that region. Unlike big table this would
leave the master out of the process and letting the region server handle the major compaction
when it has time.
> When doing a minor compaction on a few files I thank we should compact the newest mapfiles
first leave the larger/older ones for when we have low updates to a region.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message