Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 697EF200C2F for ; Mon, 6 Mar 2017 12:56:38 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 681BF160B76; Mon, 6 Mar 2017 11:56:38 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 89F2C160B73 for ; Mon, 6 Mar 2017 12:56:37 +0100 (CET) Received: (qmail 50307 invoked by uid 500); 6 Mar 2017 11:56:36 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 50296 invoked by uid 99); 6 Mar 2017 11:56:36 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 06 Mar 2017 11:56:36 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 378BFC03CB for ; Mon, 6 Mar 2017 11:56:36 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.547 X-Spam-Level: X-Spam-Status: No, score=-1.547 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id dDbnkTKe01lz for ; Mon, 6 Mar 2017 11:56:35 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id BE5245F4AA for ; Mon, 6 Mar 2017 11:56:34 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 8CAC7E08DE for ; Mon, 6 Mar 2017 11:56:33 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id DDABA24174 for ; Mon, 6 Mar 2017 11:56:32 +0000 (UTC) Date: Mon, 6 Mar 2017 11:56:32 +0000 (UTC) From: "CHIA-PING TSAI (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-17623) Reuse the bytes array when building the hfile block MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 06 Mar 2017 11:56:38 -0000 [ https://issues.apache.org/jira/browse/HBASE-17623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15897162#comment-15897162 ] CHIA-PING TSAI commented on HBASE-17623: ---------------------------------------- bq. This case is not really possible right? The compressAndEncryptDat will be null if there is no encryption/compression. bq.Why this specific change is needed now? Why we can not continue to have the type as byte[]? So may be then we will have to keep the size also? Offset will be any way 0 Yes, I keep and reuse the onDiskBlockBytesWithHeader stream bq. The PE run shows that the net run time decreases with this change right? But ur above table shows the corresponding case GC is more? ! The powerful impact on the run time is the I/O, so the connection between GC and run time was not explicit if there are too many other I/O work (ex. compaction, wal). Thank you for the feedback. [~anoop.hbase] > Reuse the bytes array when building the hfile block > --------------------------------------------------- > > Key: HBASE-17623 > URL: https://issues.apache.org/jira/browse/HBASE-17623 > Project: HBase > Issue Type: Improvement > Reporter: CHIA-PING TSAI > Assignee: CHIA-PING TSAI > Priority: Minor > Fix For: 2.0.0, 1.4.0 > > Attachments: after(snappy_hfilesize=5.04GB).png, after(snappy_hfilesize=755MB).png, before(snappy_hfilesize=5.04GB).png, before(snappy_hfilesize=755MB).png, HBASE-17623.branch-1.v0.patch, HBASE-17623.branch-1.v1.patch, HBASE-17623.branch-1.v2.patch, HBASE-17623.branch-1.v2.patch, HBASE-17623.v0.patch, HBASE-17623.v1.patch, HBASE-17623.v1.patch, HBASE-17623.v2.patch, memory allocation measurement.xlsx > > > There are two improvements. > # The onDiskBlockBytesWithHeader should maintain a bytes array which can be reused when building the hfile. > # The onDiskBlockBytesWithHeader is copied to an new bytes array only when we need to cache the block. > # If no block need to be cached, the uncompressedBlockBytesWithHeader will never be created. > {code:title=HFileBlock.java|borderStyle=solid} > private void finishBlock() throws IOException { > if (blockType == BlockType.DATA) { > this.dataBlockEncoder.endBlockEncoding(dataBlockEncodingCtx, userDataStream, > baosInMemory.getBuffer(), blockType); > blockType = dataBlockEncodingCtx.getBlockType(); > } > userDataStream.flush(); > // This does an array copy, so it is safe to cache this byte array when cache-on-write. > // Header is still the empty, 'dummy' header that is yet to be filled out. > uncompressedBlockBytesWithHeader = baosInMemory.toByteArray(); > prevOffset = prevOffsetByType[blockType.getId()]; > // We need to set state before we can package the block up for cache-on-write. In a way, the > // block is ready, but not yet encoded or compressed. > state = State.BLOCK_READY; > if (blockType == BlockType.DATA || blockType == BlockType.ENCODED_DATA) { > onDiskBlockBytesWithHeader = dataBlockEncodingCtx. > compressAndEncrypt(uncompressedBlockBytesWithHeader); > } else { > onDiskBlockBytesWithHeader = defaultBlockEncodingCtx. > compressAndEncrypt(uncompressedBlockBytesWithHeader); > } > // Calculate how many bytes we need for checksum on the tail of the block. > int numBytes = (int) ChecksumUtil.numBytes( > onDiskBlockBytesWithHeader.length, > fileContext.getBytesPerChecksum()); > // Put the header for the on disk bytes; header currently is unfilled-out > putHeader(onDiskBlockBytesWithHeader, 0, > onDiskBlockBytesWithHeader.length + numBytes, > uncompressedBlockBytesWithHeader.length, onDiskBlockBytesWithHeader.length); > // Set the header for the uncompressed bytes (for cache-on-write) -- IFF different from > // onDiskBlockBytesWithHeader array. > if (onDiskBlockBytesWithHeader != uncompressedBlockBytesWithHeader) { > putHeader(uncompressedBlockBytesWithHeader, 0, > onDiskBlockBytesWithHeader.length + numBytes, > uncompressedBlockBytesWithHeader.length, onDiskBlockBytesWithHeader.length); > } > if (onDiskChecksum.length != numBytes) { > onDiskChecksum = new byte[numBytes]; > } > ChecksumUtil.generateChecksums( > onDiskBlockBytesWithHeader, 0, onDiskBlockBytesWithHeader.length, > onDiskChecksum, 0, fileContext.getChecksumType(), fileContext.getBytesPerChecksum()); > }{code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)