phoenix-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ankit Singhal (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-3582) No significant space saving with immutable encoded column with large number of dense columns
Date Tue, 10 Jan 2017 10:09:58 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-3582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814561#comment-15814561
] 

Ankit Singhal commented on PHOENIX-3582:
----------------------------------------

For test #1,
It could be possible if varchar length is less than 4 characters, then storing offset will
be costly than storing the actual value.

test #2,
It could be because the amount of space the offset is taking in our encoding might be equivalent
to key/value/prefix length (and 0 timestamp diff) stored in FastDiff encoding. And, also offset
requires expensive data type as compared to storing just the length. 

[~mujtabachohan], can you share the absolute no's as well for both test#1 and test#2 if they
are handy with you. And it is worth trying with additional compression like snappy or GZ to
observe the effect.


> No significant space saving with immutable encoded column with large number of dense
columns
> --------------------------------------------------------------------------------------------
>
>                 Key: PHOENIX-3582
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-3582
>             Project: Phoenix
>          Issue Type: Sub-task
>            Reporter: Mujtaba Chohan
>            Assignee: Samarth Jain
>
> Tested with 2 schemas both with 5K varchar columns. In test #1 columns were named as
column_1 ... column5000 whereas in test #2 columns were 10 byte random alphanumeric. Each
columns is filled 15 random bytes and all column have values.
> For test #1, Immutable encoded column uses ~4X *more* space than non-encoded column.
Fast Diff encoding really shines when column names are highly compressible (column_1 ... column_5000)
> For test #2, For worst case where column names are not compressible since they are random
10 byte alpha numeric, immutable encoded column uses 25% less space.  
> Data generation class is attached to https://issues.apache.org/jira/browse/PHOENIX-3560.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message