Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 28 Feb 2017 15:12:45 +0000 (UTC)
From: "Eshcar Hillel (JIRA)" <jira@apache.org>
To: issues@hbase.apache.org
Message-ID: <JIRA.12997265.1471256753000.18951.1488294765705@Atlassian.JIRA>
In-Reply-To: <JIRA.12997265.1471256753000@Atlassian.JIRA>
References: <JIRA.12997265.1471256753000@Atlassian.JIRA> <JIRA.12997265.1471256753724@jira-lw-us.apache.org>
Subject: [jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for
 Flattening and Compactions
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Tue, 28 Feb 2017 15:12:51 -0000


    [ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888180#comment-15888180 ] 

Eshcar Hillel commented on HBASE-16417:
---------------------------------------

To measure write amplification in our benchmark I'm trying to capture the total size of data that is written to WAL during the experiment.
I do so by grep-ing log lines with both "filesize" and "wal" and adding the values written after "filesize=".

I need help in explaining the numbers I get.

I run both in synchronous and asynchronous wal modes, and recall that I write 100GB in the write-only experiments.
(1) In sync mode I get roughly 200GB (!) that are written to wal, under all in-memory compaction policies. In all cases we have 1673 times 121MB.
Is this reasonable? 
Could it be due to double logging of the same information?
Should I expect only 100GB in wal? 
Could it be due to alignment (my values are small -- 100B)? 
Do you know of any duplication in wal processing? 
Obviously I count only the sizes written to hdfs and not considering the 3-way replication done at the data nodes level.
  
(2) In async mode I get different numbers NONE/BASIC - 189GB, EAGER - 124GB.
Here the sizes of the files vary, NONE/BASIC write roughly 850  files, EAGER roughly 480.
Can you explain the difference in the data written to wal in sync mode vs async mode with no compaction?
Could it be due to compression when writing batches of wal entries?
Can the reduced number of files written in EAGER mode can be explained  by wal truncation done after in-memory compaction?

I realize these are a lot of questions, any input can help here.
Thanks!! 

> In-Memory MemStore Policy for Flattening and Compactions
> --------------------------------------------------------
>
>                 Key: HBASE-16417
>                 URL: https://issues.apache.org/jira/browse/HBASE-16417
>             Project: HBase
>          Issue Type: Sub-task
>            Reporter: Anastasia Braginsky
>            Assignee: Eshcar Hillel
>             Fix For: 2.0.0
>
>         Attachments: HBASE-16417-benchmarkresults-20161101.pdf, HBASE-16417-benchmarkresults-20161110.pdf, HBASE-16417-benchmarkresults-20161123.pdf, HBASE-16417-benchmarkresults-20161205.pdf
>
>


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)