From dev-return-49323-archive-asf-public=cust-asf.ponee.io@phoenix.apache.org  Sun Feb 11 18:35:06 2018
Return-Path: <dev-return-49323-archive-asf-public=cust-asf.ponee.io@phoenix.apache.org>
X-Original-To: archive-asf-public@eu.ponee.io
Delivered-To: archive-asf-public@eu.ponee.io
Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183])
	by mx-eu-01.ponee.io (Postfix) with ESMTP id 0827A18064E
	for <archive-asf-public@eu.ponee.io>; Sun, 11 Feb 2018 18:35:06 +0100 (CET)
Received: by cust-asf.ponee.io (Postfix)
	id EC5AA160C4E; Sun, 11 Feb 2018 17:35:05 +0000 (UTC)
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by cust-asf.ponee.io (Postfix) with SMTP id 170CB160C2E
	for <archive-asf-public@cust-asf.ponee.io>; Sun, 11 Feb 2018 18:35:04 +0100 (CET)
Received: (qmail 69420 invoked by uid 500); 11 Feb 2018 17:35:04 -0000
Mailing-List: contact dev-help@phoenix.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@phoenix.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@phoenix.apache.org>
List-Post: <mailto:dev@phoenix.apache.org>
List-Id: <dev.phoenix.apache.org>
Reply-To: dev@phoenix.apache.org
Delivered-To: mailing list dev@phoenix.apache.org
Received: (qmail 69409 invoked by uid 99); 11 Feb 2018 17:35:04 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 11 Feb 2018 17:35:04 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id A93DEC152B
	for <dev@phoenix.apache.org>; Sun, 11 Feb 2018 17:35:03 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -109.511
X-Spam-Level:
X-Spam-Status: No, score=-109.511 tagged_above=-999 required=6.31
	tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8,
	RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01,
	USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
	with ESMTP id xPLHJaIpYaHB for <dev@phoenix.apache.org>;
	Sun, 11 Feb 2018 17:35:02 +0000 (UTC)
Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 6BE8B5F64B
	for <dev@phoenix.apache.org>; Sun, 11 Feb 2018 17:35:02 +0000 (UTC)
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 3B382E0115
	for <dev@phoenix.apache.org>; Sun, 11 Feb 2018 17:35:01 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2344A240FA
	for <dev@phoenix.apache.org>; Sun, 11 Feb 2018 17:35:00 +0000 (UTC)
Date: Sun, 11 Feb 2018 17:35:00 +0000 (UTC)
From: "James Taylor (JIRA)" <jira@apache.org>
To: dev@phoenix.apache.org
Message-ID: <JIRA.13132914.1516677270000.172473.1518370500143@Atlassian.JIRA>
In-Reply-To: <JIRA.13132914.1516677270000@Atlassian.JIRA>
References: <JIRA.13132914.1516677270000@Atlassian.JIRA> <JIRA.13132914.1516677270536@jira-lw-us.apache.org>
Subject: [jira] [Updated] (PHOENIX-4550) Declare maximum columns to ensure
 storage is dense when table has many views
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394


     [ https://issues.apache.org/jira/browse/PHOENIX-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Taylor updated PHOENIX-4550:
----------------------------------
    Description: 
By declaring the max number of columns on a base table, we can optimize the storage for SINGLE_CELL_ARRAY_WITH_OFFSETS by not storing null values for the columns preceding the initial column of a view. This will make a huge difference in storage when you have a base table with many views. For example:

{code}
-- Declare that the base table will have no more than 10 columns
CREATE IMMUTABLE TABLE base (k1 VARCHAR, prefix CHAR(3) v1 DATE,
    CONSTRAINT pk PRIMARY KEY (k1, prefix))
    MULTI_TENANT = true,
    MAX_COLUMNS = 10;
CREATE VIEW v1(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR)
    AS SELECT * FROM base WHERE prefix = 'A00';
CREATE VIEW v2(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR);
    AS SELECT * FROM base WHERE prefix = 'A10';
...
{code}

As the number of views grow, the difference between the base table column encoding (column #1) and the starting column number of the view (since the starting offset is determined by an incrementing value on the base table) will increase. This bloats the storage as we need to store null values for column encodings between the base table column and the starting column of the view.

Instead, we'll pass through the MAX_COLUMNS value for queries and anything column encoding less than this we know it'll be at the start. Anything greater and we'll start the search from <column encoding> - <minimum view column encoding>.

The downside of this approach is if you run out of columns in the base table, you're stuck. A more flexible, but more difficult approach is outlined in PHOENIX-4596.

  was:
By declaring the max number of columns on a base table, we can optimize the storage for SINGLE_CELL_ARRAY_WITH_OFFSETS by not storing null values for the columns preceding the initial column of a view. This will make a huge difference in storage when you have a base table with many views. For example:

{code}
-- Declare that the base table will have no more than 10 columns
CREATE IMMUTABLE TABLE base (k1 VARCHAR, prefix CHAR(3) v1 DATE,
    CONSTRAINT pk PRIMARY KEY (k1, prefix))
    MULTI_TENANT = true,
    MAX_COLUMNS = 10;
CREATE VIEW v1(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR)
    AS SELECT * FROM base WHERE prefix = 'A00';
CREATE VIEW v2(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR);
    AS SELECT * FROM base WHERE prefix = 'A10';
...
{code}

As the number of views grow, the difference between the base table column encoding (column #1) and the starting column number of the view (since the starting offset is determined by an incrementing value on the base table) will increase. This bloats the storage as we need to store null values for column encodings between the base table column and the starting column of the view.

Instead, we'll pass through the MAX_COLUMNS value for queries and anything column encoding less than this we know it'll be at the start. Anything greater and we'll start the search from <column encoding> - <minimum view column encoding>.


> Declare maximum columns to ensure storage is dense when table has many views
> ----------------------------------------------------------------------------
>
>                 Key: PHOENIX-4550
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4550
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Priority: Major
>
> By declaring the max number of columns on a base table, we can optimize the storage for SINGLE_CELL_ARRAY_WITH_OFFSETS by not storing null values for the columns preceding the initial column of a view. This will make a huge difference in storage when you have a base table with many views. For example:
> {code}
> -- Declare that the base table will have no more than 10 columns
> CREATE IMMUTABLE TABLE base (k1 VARCHAR, prefix CHAR(3) v1 DATE,
>     CONSTRAINT pk PRIMARY KEY (k1, prefix))
>     MULTI_TENANT = true,
>     MAX_COLUMNS = 10;
> CREATE VIEW v1(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR)
>     AS SELECT * FROM base WHERE prefix = 'A00';
> CREATE VIEW v2(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR);
>     AS SELECT * FROM base WHERE prefix = 'A10';
> ...
> {code}
> As the number of views grow, the difference between the base table column encoding (column #1) and the starting column number of the view (since the starting offset is determined by an incrementing value on the base table) will increase. This bloats the storage as we need to store null values for column encodings between the base table column and the starting column of the view.
> Instead, we'll pass through the MAX_COLUMNS value for queries and anything column encoding less than this we know it'll be at the start. Anything greater and we'll start the search from <column encoding> - <minimum view column encoding>.
> The downside of this approach is if you run out of columns in the base table, you're stuck. A more flexible, but more difficult approach is outlined in PHOENIX-4596.


--
This message was sent by Atlassian JIRA
(v7.6.3#76005)