Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 864AC10670 for ; Wed, 26 Aug 2015 14:56:46 +0000 (UTC) Received: (qmail 14749 invoked by uid 500); 26 Aug 2015 14:56:46 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 14702 invoked by uid 500); 26 Aug 2015 14:56:46 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 14687 invoked by uid 99); 26 Aug 2015 14:56:46 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Aug 2015 14:56:46 +0000 Date: Wed, 26 Aug 2015 14:56:46 +0000 (UTC) From: "Anoop Sam John (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: =?utf-8?Q?[jira]_[Commented]_(HBASE-14283)_Reverse_scan_d?= =?utf-8?Q?oesn=E2=80=99t_work_with_HFile_inline_index/bloom_blocks?= MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14283?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D147= 13600#comment-14713600 ]=20 Anoop Sam John commented on HBASE-14283: ---------------------------------------- We use below method to get the previous block public HFileBlock readBlock(long dataBlockOffset, long onDiskBlockSize, final boolean cacheBlock, boolean pread, final boolean isCompaction, boolean updateCacheMetrics, BlockType expectedBlockType, DataBlockEncoding expectedDataBlockEncoding) So there no BlockType check and looping? May be it will read a block and = see that block is not the expected one and go to next block and check for t= ype. In seek before case instead of going fwd we should be going backward i= n case the expected block type is matching with the cur block type. That wa= y of solution will work? > Reverse scan doesn=E2=80=99t work with HFile inline index/bloom blocks > -------------------------------------------------------------- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug > Reporter: Ben Lau > Assignee: Ben Lau > Attachments: HBASE-14283.patch, hfile-seek-before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or lea= f level index blocks. The reason is because the seekBefore() call calculat= es the previous data block=E2=80=99s size by assuming data blocks are conti= guous which is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e9= 44340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both = inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HF= ile version change, but is only performant for 1 and 2-level indexes and no= t 3+. 3+ requires an HFile format update for optimal performance. =20 > This patch does not fix the bloom filter blocks bug. But the fix should = be similar to the case of inline index blocks. The reason I haven=E2=80=99= t made the change yet is I want to confirm that you guys would be fine with= me revising the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDel= eteBloomFilterMetadata) need to return the BloomFilter. Right now the HFil= eReader class doesn=E2=80=99t have a reference to the bloom filters (and he= nce their indices) and only constructs the IO streams and hence has no way = to know where the bloom blocks are in the HFile. It seems that the HFile.R= eader bloom method comments state that they =E2=80=9Cknow nothing about how= that metadata is structured=E2=80=9D but I do not know if that is a requir= ement of the abstraction (why?) or just an incidental current property.=20 > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and r= eturn BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a =E2=80=98prev= BlockSize=E2=80=99 field in the block header in the next HFile version, so = that seekBefore() calls can not only be correct but performant in all cases= . -- This message was sent by Atlassian JIRA (v6.3.4#6332)