carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "sandeep purohit (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CARBONDATA-658) Compression is not working for BigInt and Int datatype
Date Mon, 23 Jan 2017 12:33:26 GMT

    [ https://issues.apache.org/jira/browse/CARBONDATA-658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15834391#comment-15834391
] 

sandeep purohit commented on CARBONDATA-658:
--------------------------------------------

The compression of data depends on the difference between min value and max value of the column
and in above both the CSV's its difference is 99999 so for both the CSV's it's select DATA_INT
datatype for Compression, You should once try sample1.csv for SmallBigInt then it will select
the DATA_SHORT as the datatype for compression. [~ravi.pesala] [~manishgupta88]  Can you please
verify this.


> Compression is not working for BigInt and Int datatype
> ------------------------------------------------------
>
>                 Key: CARBONDATA-658
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-658
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-load
>    Affects Versions: 1.0.0-incubating
>         Environment: spark 1.6, 2.0
>            Reporter: Geetika Gupta
>         Attachments: 100000_LargeBigInt.csv, 100000_LargeInt.csv, 100000_SmallBigInt.csv,
100000_SmallInt.csv, sample1.csv
>
>
> I tried to load data into a table having bigInt as a column. Firstly I loaded small bigint
values to the table and noted down the carbondata file size then I loaded max bigint values
to the table and again noted the carbondata file size.
> For large bigint values the carbondata file size was 684.25 Kb and for small bigint values
it was 684.26 Kb. So I could not figure out whether compression is performed or not.
> I tried the same scenario with int datatype as well. For large int values the carbondata
file size was 684.24 Kb and for small int values it was 684.26 Kb.
> Below are the queries:
> For BigInt table:
> Create table test(a BigInt, b String) stored by 'carbondata';
> LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_LargeBigInt.csv' into table
test OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');
> LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_SmallBigInt.csv' into table
test OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');
> For Int table:
> Create table test(a Int, b String) stored by 'carbondata';
> LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_LargeInt.csv' into table test
OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');
> LOAD DATA INPATH 'hdfs://localhost:54311/testFiles/100000_SmallInt.csv' into table test
OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','FILEHEADER'='b,a');



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message