Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C534D200CE6 for ; Fri, 1 Sep 2017 00:09:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id C3AA416C070; Thu, 31 Aug 2017 22:09:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1330B16C061 for ; Fri, 1 Sep 2017 00:09:06 +0200 (CEST) Received: (qmail 26728 invoked by uid 500); 31 Aug 2017 22:09:06 -0000 Mailing-List: contact common-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list common-issues@hadoop.apache.org Received: (qmail 26717 invoked by uid 99); 31 Aug 2017 22:09:06 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 31 Aug 2017 22:09:06 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 9C513C3610 for ; Thu, 31 Aug 2017 22:09:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id 5Z6vNyRIqvBT for ; Thu, 31 Aug 2017 22:09:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 1146161130 for ; Thu, 31 Aug 2017 22:09:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 3B18BE0E9A for ; Thu, 31 Aug 2017 22:09:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 76D4224157 for ; Thu, 31 Aug 2017 22:09:01 +0000 (UTC) Date: Thu, 31 Aug 2017 22:09:01 +0000 (UTC) From: "Georgi Chalakov (JIRA)" To: common-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (HADOOP-14520) WASB: Block compaction for Azure Block Blobs MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 31 Aug 2017 22:09:08 -0000 [ https://issues.apache.org/jira/browse/HADOOP-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16149691#comment-16149691 ] Georgi Chalakov edited comment on HADOOP-14520 at 8/31/17 10:08 PM: -------------------------------------------------------------------- Thank you for adding all these fixes. Stream capabilities looks like an useful feature. Re:flush() FSDataOutputStream doesn't overwrite flush() and a normal flush() call on application level would not execute BlockBlobAppendStream::flush(). When the compaction is disabled hflush/hsync are nop and the performance of BlockBlobAppendStream for all operations is the same (or better) than before. Re:more than one append stream We take a lease on the blob, that means at any point of time you can have one append stream only. If we had more than one append stream opened at the same time, we couldn't guarantee the order of write operations. I have added hsync() call and made isclosed volatile. Re:close() I think the first exception is the best indication what went wrong. After an exception, close() is just best effort. I don't know how useful for a client would be to continue after IO related exception, but if that is necessary, the client can continue. If block compaction is enabled, the client can go and read all the data until the last successful hflush()/hsync(). When the block compaction is disabled, we grantee nothing. We may or may not have the data stored in the service. was (Author: georgi): Thank you for adding all these fixes. Stream capabilities looks like an useful feature. Re:flush() FSDataOutputStream doesn't overwrite flush() and a normal flush() call on application level would not execute BlockBlobAppendStream::flush(). When the compaction is disabled hflush/hsync are nop and the performance of BlockBlobAppendStream for all operations is the same (or better) than before. Re:more than one append stream We take a lease on the blob, that means at any point of time you can have one append stream only. If we had more than one append stream opened at the same time, we couldn't guarantee the order of write operations. I have added hsync() call and made isclosed volatile. Re:close() I think the first exception is the best indication what went wrong. After an exception, close() is just best effort. I don't know how useful for a client would be to continue after IO related exception, but if that is necessary, the client can continue. If block compaction is enabled, the client can go and read all the data until last hflush()/hsync(). When the block compaction is disabled, we grantee nothing. We may or may not have the data stored in the service. > WASB: Block compaction for Azure Block Blobs > -------------------------------------------- > > Key: HADOOP-14520 > URL: https://issues.apache.org/jira/browse/HADOOP-14520 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/azure > Affects Versions: 3.0.0-alpha3 > Reporter: Georgi Chalakov > Assignee: Georgi Chalakov > Attachments: HADOOP-14520-006.patch, HADOOP-14520-008.patch, HADOOP-14520-009.patch, HADOOP-14520-05.patch, HADOOP_14520_07.patch, HADOOP_14520_08.patch, HADOOP_14520_09.patch, HADOOP_14520_10.patch, HADOOP-14520-patch-07-08.diff, HADOOP-14520-patch-07-09.diff > > > Block Compaction for WASB allows uploading new blocks for every hflush/hsync call. When the number of blocks is above 32000, next hflush/hsync triggers the block compaction process. Block compaction replaces a sequence of blocks with one block. From all the sequences with total length less than 4M, compaction chooses the longest one. It is a greedy algorithm that preserve all potential candidates for the next round. Block Compaction for WASB increases data durability and allows using block blobs instead of page blobs. By default, block compaction is disabled. Similar to the configuration for page blobs, the client needs to specify HDFS folders where block compaction over block blobs is enabled. > Results for HADOOP_14520_07.patch > tested endpoint: fs.azure.account.key.hdfs4.blob.core.windows.net > Tests run: 777, Failures: 0, Errors: 0, Skipped: 155 -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: common-issues-help@hadoop.apache.org