hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eshcar Hillel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
Date Sat, 07 Oct 2017 14:58:02 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16195745#comment-16195745

Eshcar Hillel commented on HBASE-16417:

Thanks all for your questions.

bq. Can this go into branch-2?
Sure why not :)

bq. How long did the tests run for in each of the five cases?
Write-only runs started from an empty table and performed 500M puts. This took over an hour
in SSD and less than 2 hours in HDD.
Read-write runs first loaded 10GB data and then ran 500K reads with heavy writes running in
the background. These runs took 2-4 hours each.

bq. What would you recommend as default? Should we enable adaptive by default?
This is a good question.
We performed rigorous benchmarks, however these are still only micro-benchmarks, namely rely
on synthetic workloads.
I think it is best to have Basic as default for 2.0 since its behavior is more predictive,
and it requires no configuration.
Once we have users feedback we can suggest them also to try playing with adaptive and see
where it can further improve their performance.  
For sure they can configure it for specific column families which can benefit from data reduction.

bq. The effect of HDD/SSDs does it come from the fact as how fast these segments in the pipeline
are released after flushes?
In write-only workload we see that the improvement in throughput has high correlation with
reduction of total GC time. With fast SSD hardware this has higher affect on throughput as
memory management is more of a bottleneck.

bq. here we capture the throughput of writes and flushes are not in the hot path so does it
mean that we get blocking updates and the throughput depends on how fast the blocking udpates
are cleared and that depends on the segment count?
You can see in the parameter tuning report throughput increases as the number of segments
in the pipeline increases (up to some point), so I don't think we get more blocking updates
with more segments in the pipeline.
Also note that the number of segments in the snapshot depends on the timing of the flush,
it could be less than the limit.

bq. So these tests were done with changing back to the old way of per region flush decision
based on heap size NOT on data size?
Did not have time to apply these changes yet. I plan to do this next.
However, global pressure triggered many flushes, and there as you know it does check heap
size and not data size 

bq. The more the data size, the lesser will be the gain. To have a fair eval what should be
the val size to used?
I agree. With greater values the gain will be smaller. But I believe we'll still see gain.
Flat index not only takes less space but is also more friendly for memory management which
is an advantage. Moreover with adaptive we'll still see reduction in space, flushes, disk
compaction etc.
AND a recent work  claim that small values are typical in production workloads like in Facebook
and Twitter (see "LSM-trie: An lsm-tree-based ultra-large key-value store for small data items").
We ran experiments with large values in the past.
We can repeat some of the experiments with 500B which are also reported in this work.

I need a rebase plus will implement the comments above or other comments you put on RB.
Anyway happy to answer any further question/concerns you may have.

> In-Memory MemStore Policy for Flattening and Compactions
> --------------------------------------------------------
>                 Key: HBASE-16417
>                 URL: https://issues.apache.org/jira/browse/HBASE-16417
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Anastasia Braginsky
>            Assignee: Eshcar Hillel
>             Fix For: 3.0.0
>         Attachments: HBASE-16417.01.patch, HBASE-16417 - Adaptive Compaction Policy -
20171001.pdf, HBASE-16417-benchmarkresults-20161101.pdf, HBASE-16417-benchmarkresults-20161110.pdf,
HBASE-16417-benchmarkresults-20161123.pdf, HBASE-16417-benchmarkresults-20161205.pdf, HBASE-16417-benchmarkresults-20170309.pdf,
HBASE-16417-benchmarkresults-20170317.pdf, HBASE-16417 - parameter tuning - 20171001.pdf,

This message was sent by Atlassian JIRA

View raw message