hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eshcar Hillel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
Date Thu, 10 Nov 2016 22:54:58 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15655444#comment-15655444

Eshcar Hillel commented on HBASE-16417:

While running the benchmarks this week I realized I did a mistake when running data compaction
in previous rounds. I turned off the mslab flag but did not remove the chunk pool parameters
and as a result a chunk pool was allocated but not used. I re-ran these experiments this week
with no mslabs and no chunk pool and indeed the performance improved. For a fair comparison
I also ran no-compaction option with no mslabs and no chunk pool which turned out to be the
best performing setting. (See full details in the latest report.)

The focus of this week benchmarks was mixed-workload: 50% reads 50% writes. Results show that
in a mixed workload running with no mslabs and no chunk pool has a significant advantage over
running with chunk pool and mslabs. This is the case when running with no compaction or with
data compaction.

So far benchmarks do not show advantage of index-/data-compaction over no-compaction. This
might be due to several reasons:
1. Running index-/data-compaction should reduce the amount of disk compactions - the price
tag of running a disk compaction in the current system (single ssd machine) is not as high
as it would be in a production cluster.
2. Index compaction would have greater affect as the size of the cells decreases - the values
we are using now are medium size (1KB) and not small.
3. Index-/data-compaction should result in more reads being served from memory thereby reducing
reads latency - we might be using too small a data set which is efficiently served from block
cache; this is not always the case in production data sets.
4. Index-/data-compaction should result in more reads being served from memory thereby reducing
reads latency - the current implementation of reads *always* seeks the key in all store files
that may contain it even if it resides in memory, effectively masking any memory optimization
including in-memory compaction.

Directions we intend to explore next:
1. Run benchmarks on commodity machines (namely HDD and not SSD); run cluster on more than
one machine (2 RS, 3-way replication); the scale might be smaller though since our HDD machine
are modest compared to the ssd machine we have.
2. Run with smaller values - 100B instead of 1KB
3. Run bigger data sets - 10-20M keys instead of 5M keys
4. Change read (get) implementation to first seek for the key in memstore(s) only, and only
if no matching entry is found seek in all memstore segments and all relevant store files.
This could be a subject of another Jira. We believe this would be beneficial also with no
compaction, and even more when index-/data-compaction is employed. Any thought on this direction(?)

Finally a small note: a small bug was found which does not allow index-compaction to run without
mslabs. This bug is about to be fixed in a new patch Anastasia is working on.

> In-Memory MemStore Policy for Flattening and Compactions
> --------------------------------------------------------
>                 Key: HBASE-16417
>                 URL: https://issues.apache.org/jira/browse/HBASE-16417
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Anastasia Braginsky
>            Assignee: Eshcar Hillel
>             Fix For: 2.0.0
>         Attachments: HBASE-16417-benchmarkresults-20161101.pdf, HBASE-16417-benchmarkresults-20161110.pdf

This message was sent by Atlassian JIRA

View raw message