hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eshcar Hillel (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions
Date Tue, 28 Feb 2017 15:12:45 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888180#comment-15888180

Eshcar Hillel commented on HBASE-16417:

To measure write amplification in our benchmark I'm trying to capture the total size of data
that is written to WAL during the experiment.
I do so by grep-ing log lines with both "filesize" and "wal" and adding the values written
after "filesize=".

I need help in explaining the numbers I get.

I run both in synchronous and asynchronous wal modes, and recall that I write 100GB in the
write-only experiments.
(1) In sync mode I get roughly 200GB (!) that are written to wal, under all in-memory compaction
policies. In all cases we have 1673 times 121MB.
Is this reasonable? 
Could it be due to double logging of the same information?
Should I expect only 100GB in wal? 
Could it be due to alignment (my values are small -- 100B)? 
Do you know of any duplication in wal processing? 
Obviously I count only the sizes written to hdfs and not considering the 3-way replication
done at the data nodes level.
(2) In async mode I get different numbers NONE/BASIC - 189GB, EAGER - 124GB.
Here the sizes of the files vary, NONE/BASIC write roughly 850  files, EAGER roughly 480.
Can you explain the difference in the data written to wal in sync mode vs async mode with
no compaction?
Could it be due to compression when writing batches of wal entries?
Can the reduced number of files written in EAGER mode can be explained  by wal truncation
done after in-memory compaction?

I realize these are a lot of questions, any input can help here.

> In-Memory MemStore Policy for Flattening and Compactions
> --------------------------------------------------------
>                 Key: HBASE-16417
>                 URL: https://issues.apache.org/jira/browse/HBASE-16417
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Anastasia Braginsky
>            Assignee: Eshcar Hillel
>             Fix For: 2.0.0
>         Attachments: HBASE-16417-benchmarkresults-20161101.pdf, HBASE-16417-benchmarkresults-20161110.pdf,
HBASE-16417-benchmarkresults-20161123.pdf, HBASE-16417-benchmarkresults-20161205.pdf

This message was sent by Atlassian JIRA

View raw message