carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Prabhat Kashyap (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CARBONDATA-1387) Incorrect partition creation while inserting data from another table
Date Wed, 30 Aug 2017 07:15:00 GMT

    [ https://issues.apache.org/jira/browse/CARBONDATA-1387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16146793#comment-16146793
] 

Prabhat Kashyap commented on CARBONDATA-1387:
---------------------------------------------

Hello Vandana,

This is not the bug of partition but the problem is data is being inserted into one table
from another. If you insert data from another table then schema of both table should match
otherwise there are chances of getting bad records, that is what is happening here. The data
goes into the bad record (null) and the select query is showing the different result as expected.


*Possible solution:*
# Throw an exception while inserting data and schema mismatch.
# Continue with the bad record inserted into the table when schema mismatched.

> Incorrect partition creation while inserting data from another table
> --------------------------------------------------------------------
>
>                 Key: CARBONDATA-1387
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1387
>             Project: CarbonData
>          Issue Type: Bug
>          Components: data-query
>    Affects Versions: 1.2.0
>         Environment: spark 2.1
>            Reporter: Vandana Yadav
>            Assignee: Prabhat Kashyap
>
> Incorrect partition creation while inserting data from another table.
> Description: While inserting data from another table no of rows in each partition remain
same although no of rows in partitioned table get increased.(no of rows in each partition
should also increase as we are inserting new data into the partitioned table.
> Steps to reproduce:
> 1) Create Partitioned table:
> CREATE TABLE uniqdata_part (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string,DOJ
timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2
decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 int) PARTITIONED
BY (DOB Timestamp) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ('PARTITION_TYPE'='RANGE','RANGE_INFO'='1971-01-01
01:00:03, 1972-01-01 01:00:03, 1974-01-01 01:00:03',"TABLE_BLOCKSIZE"= "256 MB")
> 2) Load data into the partitioned table:
> LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into table uniqdata_part
OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 3) Create another table:
> CREATE TABLE uniqdata_1 (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION string, DOB
timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10),
DECIMAL_COLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1
int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES ("TABLE_BLOCKSIZE"= "256 MB")
> 4) Load data into this table:
> LOAD DATA INPATH 'hdfs://localhost:54310/uniqdata/2000_UniqData.csv' into table uniqdata_1
OPTIONS('DELIMITER'=',' , 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1')
> 5)Execute Queries:
> a) show partitions query:
> show partitions uniqdata_part
> Output:
> 0, dob = DEFAULT                                     
>  1, dob < 1971-01-01 01:00:03                         
>  2, 1971-01-01 01:00:03 <= dob < 1972-01-01 01:00:03  
>  3, 1972-01-01 01:00:03 <= dob < 1974-01-01 01:00:03
> b) Query for row count in the partitioned table:
> select count(*) from uniqdata_part
> Output:
>  count(1)  |
> +-----------+--+
> | 2013 
> c)query for row count in partition 0: 
> select count(*) from uniqdata_part where dob >= '1974-01-01 01:00:03'
> Output:
>  count(1)  |
> +-----------+--+
> | 539 
> d) query for row count in partition 1 :
> select count(*) from uniqdata_part where dob < '1971-01-01 01:00:03'
> Output:
>  count(1)  |
> +-----------+--+
> | 366   
> e) query for row count in partition 3:
> select count(*) from uniqdata_part where dob >= '1971-01-01 01:00:03' and dob <
'1972-01-01 01:00:03'
> Output:
>  count(1)  |
> +-----------+--+
> | 365 
> f) query for row count in partition 4:
> select count(*) from uniqdata_part where dob >= '1972-01-01 01:00:03' and dob <
'1974-01-01 01:00:03'
> Output:
> count(1)  |
> +-----------+--+
> | 731
> g) Insert data in partitioned table through the normal table:
> insert into uniqdata_part select * from uniqdata_1;
> h) Query for row count in the partitioned table after insertion operation:
> select count(*) from uniqdata_part
> Output:
>  count(1)  |
> +-----------+--+
> | 4026  
> i) Repeat queries from (c) to (f) for row count in the each partition.
> 6) Actual Result: it shows the same row count for each partition but the partitioned
table has more rows in it.
> 7)Expected Result: No of rows in each partition should increase as no of rows increases
in partitioned table



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message