carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prabhat Kashyap (JIRA)" <j...@apache.org>
Subject [jira] [Assigned] (CARBONDATA-1387) Incorrect partition creation while inserting data from another table
Date Mon, 21 Aug 2017 12:08:00 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Prabhat Kashyap reassigned CARBONDATA-1387:
-------------------------------------------

    Assignee: Prabhat Kashyap

> Incorrect partition creation while inserting data from another table
> --------------------------------------------------------------------
>
>                 Key: CARBONDATA-1387
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1387
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-query
>    Affects Versions: 1.2.0
>         Environment: spark 2.1
>            Reporter: Vandana Yadav
>            Assignee: Prabhat Kashyap
>
> Incorrect partition creation while inserting data from another table.
> Description: While inserting data from another table no of rows in each partition remain
same although no of rows in partitioned table get increased.(no of rows in each partition
should also increase as we are inserting new data into the partitioned table.
> Steps to reproduce:
> 1) Create Partitioned table:
> CREATE TABLE uniqdata_part (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string,DOJ
timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) PARTITIONED
BY (DOB Timestamp) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ('PARTITION_TYPE'='RANGE','RANGE_INFO'='1971-01-01
01:00:03, 1972-01-01 01:00:03, 1974-01-01 01:00:03',"TABLE_BLOCKSIZE"= "256 MB")
> 2) Load data into the partitioned table:
> LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into table uniqdata_part
OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 3) Create another table:
> CREATE TABLE uniqdata_1 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB
timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10),
DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1
int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB")
> 4) Load data into this table:
> LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into table uniqdata_1
OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 5)Execute Queries:
> a) show partitions query:
> show partitions uniqdata_part
> Output:
> 0, dob = DEFAULT                                     
>  1, dob < 1971-01-01 01:00:03                         
>  2, 1971-01-01 01:00:03 <= dob < 1972-01-01 01:00:03  
>  3, 1972-01-01 01:00:03 <= dob < 1974-01-01 01:00:03
> b) Query for row count in the partitioned table:
> select count(*) from uniqdata_part
> Output:
>  count(1)  |
> +-----------+--+
> | 2013 
> c)query for row count in partition 0: 
> select count(*) from uniqdata_part where dob >= '1974-01-01 01:00:03'
> Output:
>  count(1)  |
> +-----------+--+
> | 539 
> d) query for row count in partition 1 :
> select count(*) from uniqdata_part where dob < '1971-01-01 01:00:03'
> Output:
>  count(1)  |
> +-----------+--+
> | 366   
> e) query for row count in partition 3:
> select count(*) from uniqdata_part where dob >= '1971-01-01 01:00:03' and dob <
'1972-01-01 01:00:03'
> Output:
>  count(1)  |
> +-----------+--+
> | 365 
> f) query for row count in partition 4:
> select count(*) from uniqdata_part where dob >= '1972-01-01 01:00:03' and dob <
'1974-01-01 01:00:03'
> Output:
> count(1)  |
> +-----------+--+
> | 731
> g) Insert data in partitioned table through the normal table:
> insert into uniqdata_part select * from uniqdata_1;
> h) Query for row count in the partitioned table after insertion operation:
> select count(*) from uniqdata_part
> Output:
>  count(1)  |
> +-----------+--+
> | 4026  
> i) Repeat queries from (c) to (f) for row count in the each partition.
> 6) Actual Result: it shows the same row count for each partition but the partitioned
table has more rows in it.
> 7)Expected Result: No of rows in each partition should increase as no of rows increases
in partitioned table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message