hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-15248) BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a BucketCache 'block' of 4k
Date Mon, 06 Mar 2017 23:33:33 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15898383#comment-15898383

stack commented on HBASE-15248:

Added this note to BLOCKSIZE:

--- a/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
+++ b/hbase-client/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
@@ -103,7 +103,10 @@ public class HColumnDescriptor implements Comparable<HColumnDescriptor>
    * Size of storefile/hfile 'blocks'.  Default is {@link #DEFAULT_BLOCKSIZE}.
    * Use smaller block sizes for faster random-access at expense of larger
-   * indices (more memory consumption).
+   * indices (more memory consumption). Note that this is a soft limit and that
+   * blocks have overhead (metadata, CRCs) so blocks will tend to be the size
+   * specified here and then some; i.e. don't expect that setting BLOCKSIZE=4k
+   * means hbase data will align with an SSDs 4k page accesses (TODO).
   public static final String BLOCKSIZE = "BLOCKSIZE";

> BLOCKSIZE 4k should result in 4096 bytes on disk; i.e. fit inside a BucketCache 'block'
of 4k
> ---------------------------------------------------------------------------------------------
>                 Key: HBASE-15248
>                 URL: https://issues.apache.org/jira/browse/HBASE-15248
>             Project: HBase
>          Issue Type: Sub-task
>          Components: BucketCache
>            Reporter: stack
> Chatting w/ a gentleman named Daniel Pol who is messing w/ bucketcache, he wants blocks
to be the size specified in the configuration and no bigger. His hardware set ups fetches
pages of 4k and so a block that has 4k of payload but has then a header and the header of
the next block (which helps figure whats next when scanning) ends up being 4203 bytes or something,
and this then then translates into two seeks per block fetch.
> This issue is about what it would take to stay inside our configured size boundary writing
out blocks.
> If not possible, give back better signal on what to do so you could fit inside a particular

This message was sent by Atlassian JIRA

View raw message