Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EC31E200C32 for ; Thu, 9 Mar 2017 11:20:42 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id EAAFE160B67; Thu, 9 Mar 2017 10:20:42 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3DC27160B64 for ; Thu, 9 Mar 2017 11:20:42 +0100 (CET) Received: (qmail 21519 invoked by uid 500); 9 Mar 2017 10:20:41 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 21508 invoked by uid 99); 9 Mar 2017 10:20:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 09 Mar 2017 10:20:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id ED15EC128B for ; Thu, 9 Mar 2017 10:20:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.451 X-Spam-Level: * X-Spam-Status: No, score=1.451 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id cVWO6aQbnw29 for ; Thu, 9 Mar 2017 10:20:40 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 15BEC60DFC for ; Thu, 9 Mar 2017 10:20:40 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id F20DAE0A2B for ; Thu, 9 Mar 2017 10:20:38 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 3664B243C4 for ; Thu, 9 Mar 2017 10:20:38 +0000 (UTC) Date: Thu, 9 Mar 2017 10:20:38 +0000 (UTC) From: "Jean-Marc Spaggiari (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HBASE-17755) CellBasedKeyBlockIndexReader#midkey should exhaust search of the target middle key on skewed regions MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 09 Mar 2017 10:20:43 -0000 [ https://issues.apache.org/jira/browse/HBASE-17755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15902844#comment-15902844 ] Jean-Marc Spaggiari commented on HBASE-17755: --------------------------------------------- Don't we risk to generate very small new daughter regions? > CellBasedKeyBlockIndexReader#midkey should exhaust search of the target middle key on skewed regions > ---------------------------------------------------------------------------------------------------- > > Key: HBASE-17755 > URL: https://issues.apache.org/jira/browse/HBASE-17755 > Project: HBase > Issue Type: Bug > Components: HFile > Reporter: Esteban Gutierrez > Assignee: Esteban Gutierrez > > We have always been returning the middle key of the the block index regardless the distribution of the data on an HFile. A side effect of that approach is that when millions of rows share the same key its quite easy to run into a situation when the start key is equal to the middle key or when the end key is equal to the middle key making that HFile nearly impossible to split until enough data is written into the region and the middle key shifts to another row or when an operator uses a custom split point in order to split that region. > Instead we should exhaust the search of the middle key in the block index in order to be able to split an HFile earlier when possible even if our edge case is to serve a region that could hold a single key with millions of versions of a row or with millions of qualifiers on the same row. -- This message was sent by Atlassian JIRA (v6.3.15#6346)