Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 19090200C29 for ; Tue, 28 Feb 2017 16:12:51 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 17E13160B6A; Tue, 28 Feb 2017 15:12:51 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6187D160B7C for ; Tue, 28 Feb 2017 16:12:50 +0100 (CET) Received: (qmail 21615 invoked by uid 500); 28 Feb 2017 15:12:49 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 21604 invoked by uid 99); 28 Feb 2017 15:12:49 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 28 Feb 2017 15:12:49 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 07CDF18E122 for ; Tue, 28 Feb 2017 15:12:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -2.347 X-Spam-Level: X-Spam-Status: No, score=-2.347 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id qRI6i0gZ2l_X for ; Tue, 28 Feb 2017 15:12:48 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id A33F560E17 for ; Tue, 28 Feb 2017 15:12:47 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id BDC0EE0A12 for ; Tue, 28 Feb 2017 15:12:46 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id ACA182417A for ; Tue, 28 Feb 2017 15:12:45 +0000 (UTC) Date: Tue, 28 Feb 2017 15:12:45 +0000 (UTC) From: "Eshcar Hillel (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-16417) In-Memory MemStore Policy for Flattening and Compactions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 28 Feb 2017 15:12:51 -0000 [ https://issues.apache.org/jira/browse/HBASE-16417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15888180#comment-15888180 ] Eshcar Hillel commented on HBASE-16417: --------------------------------------- To measure write amplification in our benchmark I'm trying to capture the total size of data that is written to WAL during the experiment. I do so by grep-ing log lines with both "filesize" and "wal" and adding the values written after "filesize=". I need help in explaining the numbers I get. I run both in synchronous and asynchronous wal modes, and recall that I write 100GB in the write-only experiments. (1) In sync mode I get roughly 200GB (!) that are written to wal, under all in-memory compaction policies. In all cases we have 1673 times 121MB. Is this reasonable? Could it be due to double logging of the same information? Should I expect only 100GB in wal? Could it be due to alignment (my values are small -- 100B)? Do you know of any duplication in wal processing? Obviously I count only the sizes written to hdfs and not considering the 3-way replication done at the data nodes level. (2) In async mode I get different numbers NONE/BASIC - 189GB, EAGER - 124GB. Here the sizes of the files vary, NONE/BASIC write roughly 850 files, EAGER roughly 480. Can you explain the difference in the data written to wal in sync mode vs async mode with no compaction? Could it be due to compression when writing batches of wal entries? Can the reduced number of files written in EAGER mode can be explained by wal truncation done after in-memory compaction? I realize these are a lot of questions, any input can help here. Thanks!! > In-Memory MemStore Policy for Flattening and Compactions > -------------------------------------------------------- > > Key: HBASE-16417 > URL: https://issues.apache.org/jira/browse/HBASE-16417 > Project: HBase > Issue Type: Sub-task > Reporter: Anastasia Braginsky > Assignee: Eshcar Hillel > Fix For: 2.0.0 > > Attachments: HBASE-16417-benchmarkresults-20161101.pdf, HBASE-16417-benchmarkresults-20161110.pdf, HBASE-16417-benchmarkresults-20161123.pdf, HBASE-16417-benchmarkresults-20161205.pdf > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)