Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F39C51841D for ; Wed, 2 Mar 2016 05:36:18 +0000 (UTC) Received: (qmail 36523 invoked by uid 500); 2 Mar 2016 05:36:18 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 36475 invoked by uid 500); 2 Mar 2016 05:36:18 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 36448 invoked by uid 99); 2 Mar 2016 05:36:18 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Mar 2016 05:36:18 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 419262C1F58 for ; Wed, 2 Mar 2016 05:36:18 +0000 (UTC) Date: Wed, 2 Mar 2016 05:36:18 +0000 (UTC) From: "stack (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-15366) Add doc, trace-level logging, and test around hfileblock MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-15366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15175070#comment-15175070 ] stack commented on HBASE-15366: ------------------------------- bq, So the bucket cache is also caching these extra bytest you mean? This is interesting. Worth changing it I believe. Yes. When we read from HDFS, we read the next blocks header too. Generally it makes it so we can read a block with one seek rather than two (to be better evaluated but seems to be the case in testing -- a cold read of a block requires a seek to read the block header to see how to read the rest of the block.. its length, how its checksummed, etc.... we should change this!). This block+next header+13 bytes of EXTRA stuff is what we shove into the bucketcache (The EXTRA stuff is meta data needed reconstituting hfileblock from its bucketcache representation -- we should fix this (smile)). Every hfileblock in blockcache is carrying an extra 50 bytes. The original "Why are there 50 bytes tagged on to the end of the hfileblock?" question came from a gentleman named Daniel Pol who is trying to go big w/ bucketcache. Thanks for the review [~ram_krish] Let me get this doc in first before I start making changes. Thanks. > Add doc, trace-level logging, and test around hfileblock > -------------------------------------------------------- > > Key: HBASE-15366 > URL: https://issues.apache.org/jira/browse/HBASE-15366 > Project: HBase > Issue Type: Sub-task > Components: BlockCache > Affects Versions: 2.0.0 > Reporter: stack > Assignee: stack > Fix For: 2.0.0 > > Attachments: 15366.patch, 15366v2.patch, 15366v3.patch, 15366v4.patch > > > What hfileblock is doing -- that it overreads when pulling in from hdfs to fetch the header of the next block to save on seeks; that it caches the block and overread and then adds an extra 13 bytes to the cached entry; that buckets in bucketcache have at least four hfileblocks in them and so on -- was totally baffling me. This patch docs the class, adds some trace-level logging so you can see if you are doing the right thing, and then adds a test of file-backed bucketcache that checks that persistence is working. -- This message was sent by Atlassian JIRA (v6.3.4#6332)