hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Anoop Sam John (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-17623) Reuse the bytes array when building the hfile block
Date Tue, 21 Feb 2017 10:17:44 GMT

    [ https://issues.apache.org/jira/browse/HBASE-17623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15875738#comment-15875738
] 

Anoop Sam John commented on HBASE-17623:
----------------------------------------

When we have cache on write, we will have to create new copy of the block data (say byte[]).
That is any way needed.
Now say when it is not required and there is no compress/encryption happening, we can just
pass the accumulated byte[] (offset  - length) to the DOS which is FSDataOS.  Now here at
this case also one copy happens.  Is that really needed?   Also for doing compress/encryption,
the same accumulated bytes can be passed and there any way we create a new compressed/encrypted
byte[] which is written to DOS.  Now temp array copy happens here and we can make that one?
  Even this patch also do not change the #copy ops but make object to stay there (byte[])
and reuse rather than recreate. Correct?   I may be wrong in saying whether we can avoid this
temp copies.  Pls think from that angle also and see..   Ya to make the life of an object
longer might not come down as good always..  Pls check with G1 also as Ram suggested.  With
diff types of experiments, we can make this area better.  This is great work happening here.
Thanks for the efforts.

> Reuse the bytes array when building the hfile block
> ---------------------------------------------------
>
>                 Key: HBASE-17623
>                 URL: https://issues.apache.org/jira/browse/HBASE-17623
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: CHIA-PING TSAI
>            Assignee: CHIA-PING TSAI
>            Priority: Minor
>             Fix For: 2.0.0, 1.4.0
>
>         Attachments: after(snappy_hfilesize=5.04GB).png, after(snappy_hfilesize=755MB).png,
before(snappy_hfilesize=5.04GB).png, before(snappy_hfilesize=755MB).png, HBASE-17623.branch-1.v0.patch,
HBASE-17623.branch-1.v1.patch, HBASE-17623.v0.patch, HBASE-17623.v1.patch, HBASE-17623.v1.patch,
memory allocation measurement.xlsx
>
>
> There are two improvements.
> # The uncompressedBlockBytesWithHeader and onDiskBlockBytesWithHeader should maintain
a bytes array which can be reused when building the hfile.
> # The uncompressedBlockBytesWithHeader/onDiskBlockBytesWithHeader is copied to an new
bytes array only when we need to cache the block.
> {code:title=HFileBlock.java|borderStyle=solid}
>     private void finishBlock() throws IOException {
>       if (blockType == BlockType.DATA) {
>         this.dataBlockEncoder.endBlockEncoding(dataBlockEncodingCtx, userDataStream,
>             baosInMemory.getBuffer(), blockType);
>         blockType = dataBlockEncodingCtx.getBlockType();
>       }
>       userDataStream.flush();
>       // This does an array copy, so it is safe to cache this byte array when cache-on-write.
>       // Header is still the empty, 'dummy' header that is yet to be filled out.
>       uncompressedBlockBytesWithHeader = baosInMemory.toByteArray();
>       prevOffset = prevOffsetByType[blockType.getId()];
>       // We need to set state before we can package the block up for cache-on-write.
In a way, the
>       // block is ready, but not yet encoded or compressed.
>       state = State.BLOCK_READY;
>       if (blockType == BlockType.DATA || blockType == BlockType.ENCODED_DATA) {
>         onDiskBlockBytesWithHeader = dataBlockEncodingCtx.
>             compressAndEncrypt(uncompressedBlockBytesWithHeader);
>       } else {
>         onDiskBlockBytesWithHeader = defaultBlockEncodingCtx.
>             compressAndEncrypt(uncompressedBlockBytesWithHeader);
>       }
>       // Calculate how many bytes we need for checksum on the tail of the block.
>       int numBytes = (int) ChecksumUtil.numBytes(
>           onDiskBlockBytesWithHeader.length,
>           fileContext.getBytesPerChecksum());
>       // Put the header for the on disk bytes; header currently is unfilled-out
>       putHeader(onDiskBlockBytesWithHeader, 0,
>           onDiskBlockBytesWithHeader.length + numBytes,
>           uncompressedBlockBytesWithHeader.length, onDiskBlockBytesWithHeader.length);
>       // Set the header for the uncompressed bytes (for cache-on-write) -- IFF different
from
>       // onDiskBlockBytesWithHeader array.
>       if (onDiskBlockBytesWithHeader != uncompressedBlockBytesWithHeader) {
>         putHeader(uncompressedBlockBytesWithHeader, 0,
>           onDiskBlockBytesWithHeader.length + numBytes,
>           uncompressedBlockBytesWithHeader.length, onDiskBlockBytesWithHeader.length);
>       }
>       if (onDiskChecksum.length != numBytes) {
>         onDiskChecksum = new byte[numBytes];
>       }
>       ChecksumUtil.generateChecksums(
>           onDiskBlockBytesWithHeader, 0, onDiskBlockBytesWithHeader.length,
>           onDiskChecksum, 0, fileContext.getChecksumType(), fileContext.getBytesPerChecksum());
>     }{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message