hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17623) Reuse the bytes array when building the hfile block
Date Mon, 06 Mar 2017 07:35:32 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896834#comment-15896834
] 

Anoop Sam John commented on HBASE-17623:
----------------------------------------

Understood the patch.   So we avoid the need to copy many times.  'baosInMemory' is still
there and we keep accumulating cells to that.  Once the block to be written to HFile, it will
go there with no extra copy , if there are no encryption/compression AND no need for cache
on write.  When encryption of compression is present, we use the same baosInMemory stream
to create new encrypted/compressed stream.  When cache on write, we will end up in copying
both onDisk and uncompressed streams.
{code}
 if (compressAndEncryptDat == null) {
1031	        compressAndEncryptDat = new Bytes(baosInMemory.getBuffer(), 0, baosInMemory.size());
1032	      }
{code}
This case is not really possible right?
bq.private ByteArrayOutputStream onDiskBlockBytesWithHeader;
Why this specific change is needed now?  Why we can not continue to have the type as byte[]?
So may be then we will have to keep the size also?  Offset will be any way 0. Just asking

> Reuse the bytes array when building the hfile block
> ---------------------------------------------------
>
>                 Key: HBASE-17623
>                 URL: https://issues.apache.org/jira/browse/HBASE-17623
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: CHIA-PING TSAI
>            Assignee: CHIA-PING TSAI
>            Priority: Minor
>             Fix For: 2.0.0, 1.4.0
>
>         Attachments: after(snappy_hfilesize=5.04GB).png, after(snappy_hfilesize=755MB).png,
before(snappy_hfilesize=5.04GB).png, before(snappy_hfilesize=755MB).png, HBASE-17623.branch-1.v0.patch,
HBASE-17623.branch-1.v1.patch, HBASE-17623.branch-1.v2.patch, HBASE-17623.branch-1.v2.patch,
HBASE-17623.v0.patch, HBASE-17623.v1.patch, HBASE-17623.v1.patch, HBASE-17623.v2.patch, memory
allocation measurement.xlsx
>
>
> There are two improvements.
> # The onDiskBlockBytesWithHeader should maintain a bytes array which can be reused when
building the hfile.
> # The onDiskBlockBytesWithHeader is copied to an new bytes array only when we need to
cache the block.
> # If no block need to be cached, the uncompressedBlockBytesWithHeader will never be created.
> {code:title=HFileBlock.java|borderStyle=solid}
>     private void finishBlock() throws IOException {
>       if (blockType == BlockType.DATA) {
>         this.dataBlockEncoder.endBlockEncoding(dataBlockEncodingCtx, userDataStream,
>             baosInMemory.getBuffer(), blockType);
>         blockType = dataBlockEncodingCtx.getBlockType();
>       }
>       userDataStream.flush();
>       // This does an array copy, so it is safe to cache this byte array when cache-on-write.
>       // Header is still the empty, 'dummy' header that is yet to be filled out.
>       uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>       prevOffset = prevOffsetByType[blockType.getId()];
>       // We need to set state before we can package the block up for cache-on-write.
In a way, the
>       // block is ready, but not yet encoded or compressed.
>       state = State.BLOCK_READY;
>       if (blockType == BlockType.DATA || blockType == BlockType.ENCODED_DATA) {
>         onDiskBlockBytesWithHeader = dataBlockEncodingCtx.
>             compressAndEncrypt(uncompressedBlockBytesWithHeader);
>       } else {
>         onDiskBlockBytesWithHeader = defaultBlockEncodingCtx.
>             compressAndEncrypt(uncompressedBlockBytesWithHeader);
>       }
>       // Calculate how many bytes we need for checksum on the tail of the block.
>       int numBytes = (int) ChecksumUtil.numBytes(
>           onDiskBlockBytesWithHeader.length,
>           fileContext.getBytesPerChecksum());
>       // Put the header for the on disk bytes; header currently is unfilled-out
>       putHeader(onDiskBlockBytesWithHeader, 0,
>           onDiskBlockBytesWithHeader.length + numBytes,
>           uncompressedBlockBytesWithHeader.length, onDiskBlockBytesWithHeader.length);
>       // Set the header for the uncompressed bytes (for cache-on-write) -- IFF different
from
>       // onDiskBlockBytesWithHeader array.
>       if (onDiskBlockBytesWithHeader != uncompressedBlockBytesWithHeader) {
>         putHeader(uncompressedBlockBytesWithHeader, 0,
>           onDiskBlockBytesWithHeader.length + numBytes,
>           uncompressedBlockBytesWithHeader.length, onDiskBlockBytesWithHeader.length);
>       }
>       if (onDiskChecksum.length != numBytes) {
>         onDiskChecksum = new byte[numBytes];
>       }
>       ChecksumUtil.generateChecksums(
>           onDiskBlockBytesWithHeader, 0, onDiskBlockBytesWithHeader.length,
>           onDiskChecksum, 0, fileContext.getChecksumType(), fileContext.getBytesPerChecksum());
>     }{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message