hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergey Shelukhin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-7967) implement compactor for stripe compactions
Date Fri, 29 Mar 2013 22:09:17 GMT

    [ https://issues.apache.org/jira/browse/HBASE-7967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617776#comment-13617776
] 

Sergey Shelukhin commented on HBASE-7967:
-----------------------------------------


bq. 1. Inside one stripe, can we reuse some logic in the default compaction policy? The logic
should be similar, right?  There are many new configuration parameters, can we re-use some
from the default policy, such as max files, min files, etc?  Especially, they can be tuned
per table/column family.
If you look at the patch in HBASE-7680, we do that. I wanted to keep parameters separate as
they might be different, but yeah it probably makes sense to reuse them. 
HBASE-7571 allows per-table/per-cf setting, example (from code; shell also supports this):
{code}
htd.setConfiguration(StoreEngine.STORE_ENGINE_CLASS_KEY, StripeStoreEngine.class.getName());
      htd.setConfiguration(StripeStoreConfig.CountBased.FIXED_COUNT_KEY, stripeCount.toString());
      htd.setConfiguration(HStore.BLOCKING_STOREFILES_KEY, Long.toString(7 * stripeCount));
      if (l0FileCount != null) {
        htd.setConfiguration(StripeStoreConfig.MIN_FILES_L0_KEY, l0FileCount.toString());
      }
      if (assumeOrdering != null) {
        htd.setConfiguration(StripeStoreConfig.ASSUME_ORDERING_KEY, assumeOrdering.toString());
      }
{code}


bq. 2. There is a configuration assumeOrdering.  When should it be used?
This is related to dropping deletes. There's a recently discussed window in HBase where you
can make out of order Put before/during major compaction, and it will not be visible before
major compaction, but become visible after it finishes and drops delete markers.
This setting will extends this window up to N memstore flushes instead of 1, where N is number
of L0 files (each a memstore flush); by not considering out of order puts for L0 files in
most compactions.
As a benefit, you don't need to make bigger compactions just to drop deletes. So if you don't
use out of order puts or are ok with existing window, you should use it.

bq. 3. Will we support any stripe type other than count based/size based?  If so, probably
we need to change how stripe type is configured, since it seems that we can support only two
types now .
Maybe. Hybrid "size+count" based stack mentioned would probably be just improvement of count,
if implemented.
Do you think it's worth changing now?

bq. 4. For count based, do we have to always have that many stripes?  Is it ok to have a size
limit or something so that we don't have many small stripes?
As a future improvement it is possible, will add to doc.

bq. 5. Based on the performance test you did, the write performance is not better. You mentioned
it could be because of write amplification. Do we have some number to prove it?  If we have
more IO, should the read performance be affected too?
Well, I have numbers for write amplification - in count scheme, there's at least x2 write
amplification :) I measured ~2.5 in my first test with bad settings (not the one in the doc
:)). After current test finishes I will post the results.

bq. 6. Can we have some doc to walk through the algorithm you implemented for the count/size
based compaction policy? I was wondering how some L0 files end up in a specific stripe, how
each stripe is created and maintained. Some flow-chart may be very helpful.
The doc attached to this JIRA describes all that. Doesn't have pictures though :( Do you mean
on top of that doc.
                
> implement compactor for stripe compactions
> ------------------------------------------
>
>                 Key: HBASE-7967
>                 URL: https://issues.apache.org/jira/browse/HBASE-7967
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Compaction
>            Reporter: Sergey Shelukhin
>            Assignee: Sergey Shelukhin
>         Attachments: HBASE-7967-v0.patch, HBASE-7967-v0-with-stuff.patch, HBASE-7967-v1.patch,
HBASE-7967-v1-with-7679-7680.patch, HBASE-7967-v2.patch, HBASE-7967-v2-with-7679-7680.patch
>
>
> Compactor needs to be implemented. See details in parent and blocking jira.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message