phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Thomas D'Silva (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-4550) Allow declaration of max columns on base physical table
Date Tue, 23 Jan 2018 19:25:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16336280#comment-16336280
] 

Thomas D'Silva commented on PHOENIX-4550:
-----------------------------------------

[~jamestaylor]

Another way to deal with the sparsness of view data is to use a mapping from column qualifier
to array index. We could build this map when we create the PTable of the view. This would
allow us to have a compact array. For example if the col qualifiers of the view cols are:

base table cols : 1-2
view cols : 100-102, 500

We could have a map from col qualifier to array index
1->1, 2->2, 100->3,101->4,102->5, 500->6

We would have to have to handle columns that are deleted by flagging them (we already do this
by using the LinkType.EXCLUDED_COLUMN for splittable system catalog in PHOENIX-3534), so that
we build the map in the PTable correctly. We can actually drop the view column metadata when
we rewrite the single cell without the deleted column value during compaction. 



> Allow declaration of max columns on base physical table
> -------------------------------------------------------
>
>                 Key: PHOENIX-4550
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4550
>             Project: Phoenix
>          Issue Type: Improvement
>            Reporter: James Taylor
>            Priority: Major
>
> By declaring the max number of columns on a base table, we can optimize the storage for
SINGLE_CELL_ARRAY_WITH_OFFSETS by not storing null values for the columns preceding the initial
column of a view. This will make a huge difference in storage when you have a base table with
many views. For example:
> {code}
> -- Declare that the base table will have no more than 10 columns
> CREATE IMMUTABLE TABLE base (k1 VARCHAR, prefix CHAR(3) v1 DATE,
>     CONSTRAINT pk PRIMARY KEY (k1, prefix))
>     MULTI_TENANT = true,
>     MAX_COLUMNS = 10;
> CREATE VIEW v1(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR)
>     AS SELECT * FROM base WHERE prefix = 'A00';
> CREATE VIEW v2(k2 VARCHAR PRIMARY KEY, v2 VARCHAR, v3 VARCHAR);
>     AS SELECT * FROM base WHERE prefix = 'A10';
> ...
> {code}
> As the number of views grow, the difference between the base table column encoding (column
#1) and the starting column number of the view (since the starting offset is determined by
an incrementing value on the base table) will increase. This bloats the storage as we need
to store null values for column encodings between the base table column and the starting column
of the view.
> Instead, we'll pass through the MAX_COLUMNS value for queries and anything column encoding
less than this we know it'll be at the start. Anything greater and we'll start the search
from <column encoding> - <minimum view column encoding>.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message