From dev-return-54090-archive-asf-public=cust-asf.ponee.io@phoenix.apache.org Fri Sep 28 04:20:04 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 50B30180675 for ; Fri, 28 Sep 2018 04:20:04 +0200 (CEST) Received: (qmail 45582 invoked by uid 500); 28 Sep 2018 02:20:03 -0000 Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list dev@phoenix.apache.org Received: (qmail 45553 invoked by uid 99); 28 Sep 2018 02:20:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Sep 2018 02:20:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id C80E2C27B6 for ; Fri, 28 Sep 2018 02:20:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.501 X-Spam-Level: X-Spam-Status: No, score=-109.501 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id azLq9pF4eTTL for ; Fri, 28 Sep 2018 02:20:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 299F35F300 for ; Fri, 28 Sep 2018 02:20:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 603D8E0A31 for ; Fri, 28 Sep 2018 02:20:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 27E8823F9E for ; Fri, 28 Sep 2018 02:20:00 +0000 (UTC) Date: Fri, 28 Sep 2018 02:20:00 +0000 (UTC) From: "Lars Hofhansl (JIRA)" To: dev@phoenix.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (PHOENIX-4932) Brainstorm more ways to avoid special SPLIT handling in Phoenix MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/PHOENIX-4932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lars Hofhansl updated PHOENIX-4932: ----------------------------------- Description: Currently Phoenix still requires special handling and retries (automated and manually by the client user) when SPLITs occur in HBase. PHOENIX-4849 avoids that for "simple" SELECTs. I think we can go further if we add a bit more logic to the client like this: * Sorts. As we merge sort partial server results from the server scan, start a "merge bucket" when we see the next K/V to be out of order (that can happen when HBase executes partial scan across the new daughter regions) * Aggregates. Make sure the client can deal with more than one result per scan. I.e. for a SUM the scanner might return two results if HBase splits the scan across two regions. Similarly for AVG, client needs to deal with two sets of SUM/COUNT. * Offset. Make sure the client applies the offset. The server might return more. (this might be more complicated... haven't look too closely) In summary: We should let HBase do its things as much as possible. HBase already deals with SPLITs, scans are restarted and scan across regions, the region cache on the client is invalidated, etc. Just parking this here. This is not new. The ideas are probably not new either. [~tdsilva], FYI. was: Currently Phoenix still requires special handling and retries (automated and manually by the client user) when SPLITs occur in HBase. PHOENIX-4849 avoids that for "simple" SELECTs. I think we can go further if we add a bit more logic to the client like this: * Sorts. As we merge sort partial server results from the server scan, start a "merge bucket" when we need the next K/V to be out of order (that can happen when HBase executes partial scan across the new daughter regions) * Aggregates. Make sure the client can deal with more than one result per scan. I.e. for a SUM the scanner might return two results if HBase splits the scan across two regions. Similarly for AVG, client needs to deal with two sets of SUM/COUNT. * Offset. Make sure the client applies the offset. The server might return more. (this might be more complicated... haven't look too closely) In summary: We should let HBase do its things as much as possible. HBase already deals with SPLITs, scans are restarted and scan across regions, the region cache on the client is invalidated, etc. Just parking this here. This is not new. The ideas are probably not new either. [~tdsilva], FYI. > Brainstorm more ways to avoid special SPLIT handling in Phoenix > --------------------------------------------------------------- > > Key: PHOENIX-4932 > URL: https://issues.apache.org/jira/browse/PHOENIX-4932 > Project: Phoenix > Issue Type: Improvement > Reporter: Lars Hofhansl > Priority: Major > > Currently Phoenix still requires special handling and retries (automated and manually by the client user) when SPLITs occur in HBase. > PHOENIX-4849 avoids that for "simple" SELECTs. I think we can go further if we add a bit more logic to the client like this: > * Sorts. As we merge sort partial server results from the server scan, start a "merge bucket" when we see the next K/V to be out of order (that can happen when HBase executes partial scan across the new daughter regions) > * Aggregates. Make sure the client can deal with more than one result per scan. I.e. for a SUM the scanner might return two results if HBase splits the scan across two regions. Similarly for AVG, client needs to deal with two sets of SUM/COUNT. > * Offset. Make sure the client applies the offset. The server might return more. (this might be more complicated... haven't look too closely) > In summary: We should let HBase do its things as much as possible. HBase already deals with SPLITs, scans are restarted and scan across regions, the region cache on the client is invalidated, etc. > Just parking this here. This is not new. The ideas are probably not new either. > [~tdsilva], FYI. -- This message was sent by Atlassian JIRA (v7.6.3#76005)