phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mujtaba Chohan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-3559) More disk space used with encoded column scheme with data in sparse columns
Date Thu, 05 Jan 2017 20:01:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15802372#comment-15802372
] 

Mujtaba Chohan commented on PHOENIX-3559:
-----------------------------------------

Sure [~jamestaylor] agreed. I see this is not optimized for sparse columns but for one of
our internal use case which is based on schema driven by customers, encoded columns could
potentially be used in this way so at least it's good to know the limits and breakeven point.

I also tested with slightly longer column names as column_1 ...column_5000 and the comparative
data sizes were the same which might be due to FAST_DIFF encoding that we have on by default.

Thanks [~ankit@apache.org] for those data points.

> More disk space used with encoded column scheme with data in sparse columns
> ---------------------------------------------------------------------------
>
>                 Key: PHOENIX-3559
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3559
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Mujtaba Chohan
>            Assignee: Samarth Jain
>             Fix For: 4.10.0
>
>
> Schema with 5K columns
> {noformat}
> create table (k1 integer, k2 integer, c1 varchar ... c5000 varchar CONSTRAINT PK PRIMARY
KEY (K1, K2)) 
> VERSIONS=1, MULTI_TENANT=true, IMMUTABLE_ROWS=true
> {noformat}
> In this schema, only 100 random columns are filled with random 15 chars. Rest are nulls.
> Data size is *6X* larger with encoded columns scheme compare to non-encoded. That is
12GB/1M rows encoded vs ~2GB/1M rows non-encoded.
> When compressed GZ, size with encoded column scheme is still 35% higher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message