Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 516901062D for ; Mon, 31 Aug 2015 23:05:47 +0000 (UTC) Received: (qmail 24499 invoked by uid 500); 31 Aug 2015 23:05:47 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 24459 invoked by uid 500); 31 Aug 2015 23:05:47 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 24447 invoked by uid 99); 31 Aug 2015 23:05:47 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Aug 2015 23:05:47 +0000 Date: Mon, 31 Aug 2015 23:05:47 +0000 (UTC) From: "Ben Lau (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: =?utf-8?Q?[jira]_[Commented]_(HBASE-14283)_Reverse_scan_d?= =?utf-8?Q?oesn=E2=80=99t_work_with_HFile_inline_index/bloom_blocks?= MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-14283?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D147= 24357#comment-14724357 ]=20 Ben Lau commented on HBASE-14283: --------------------------------- Reviewboard link with updated version of patch (v3): https://reviews.apache= .org/r/37971/ > Reverse scan doesn=E2=80=99t work with HFile inline index/bloom blocks > -------------------------------------------------------------- > > Key: HBASE-14283 > URL: https://issues.apache.org/jira/browse/HBASE-14283 > Project: HBase > Issue Type: Bug > Reporter: Ben Lau > Assignee: Ben Lau > Attachments: HBASE-14283-v2.patch, HBASE-14283.patch, hfile-seek-= before.patch > > > Reverse scans do not work if an HFile contains inline bloom blocks or lea= f level index blocks. The reason is because the seekBefore() call calculat= es the previous data block=E2=80=99s size by assuming data blocks are conti= guous which is not the case in HFile V2 and beyond. > Attached is a first cut patch (targeting bcef28eefaf192b0ad48c8011f98b8e9= 44340da5 on trunk) which includes: > (1) a unit test which exposes the bug and demonstrates failures for both = inline bloom blocks and inline index blocks > (2) a proposed fix for inline index blocks that does not require a new HF= ile version change, but is only performant for 1 and 2-level indexes and no= t 3+. 3+ requires an HFile format update for optimal performance. =20 > This patch does not fix the bloom filter blocks bug. But the fix should = be similar to the case of inline index blocks. The reason I haven=E2=80=99= t made the change yet is I want to confirm that you guys would be fine with= me revising the HFile.Reader interface. > Specifically, these 2 functions (getGeneralBloomFilterMetadata and getDel= eteBloomFilterMetadata) need to return the BloomFilter. Right now the HFil= eReader class doesn=E2=80=99t have a reference to the bloom filters (and he= nce their indices) and only constructs the IO streams and hence has no way = to know where the bloom blocks are in the HFile. It seems that the HFile.R= eader bloom method comments state that they =E2=80=9Cknow nothing about how= that metadata is structured=E2=80=9D but I do not know if that is a requir= ement of the abstraction (why?) or just an incidental current property.=20 > We would like to do 3 things with community approval: > (1) Update the HFile.Reader interface and implementation to contain and r= eturn BloomFilters directly rather than unstructured IO streams > (2) Merge the fixes for index blocks and bloom blocks into open source > (3) Create a new Jira ticket for open source HBase to add a =E2=80=98prev= BlockSize=E2=80=99 field in the block header in the next HFile version, so = that seekBefore() calls can not only be correct but performant in all cases= . -- This message was sent by Atlassian JIRA (v6.3.4#6332)