From issues-return-33708-archive-asf-public=cust-asf.ponee.io@carbondata.apache.org  Thu Feb  1 08:37:05 2018
Return-Path: <issues-return-33708-archive-asf-public=cust-asf.ponee.io@carbondata.apache.org>
X-Original-To: archive-asf-public@eu.ponee.io
Delivered-To: archive-asf-public@eu.ponee.io
Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183])
	by mx-eu-01.ponee.io (Postfix) with ESMTP id 922D2180652
	for <archive-asf-public@eu.ponee.io>; Thu,  1 Feb 2018 08:37:05 +0100 (CET)
Received: by cust-asf.ponee.io (Postfix)
	id 816FB160C56; Thu,  1 Feb 2018 07:37:05 +0000 (UTC)
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by cust-asf.ponee.io (Postfix) with SMTP id A723D160C26
	for <archive-asf-public@cust-asf.ponee.io>; Thu,  1 Feb 2018 08:37:04 +0100 (CET)
Received: (qmail 52505 invoked by uid 500); 1 Feb 2018 07:37:03 -0000
Mailing-List: contact issues-help@carbondata.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:issues-help@carbondata.apache.org>
List-Unsubscribe: <mailto:issues-unsubscribe@carbondata.apache.org>
List-Post: <mailto:issues@carbondata.apache.org>
List-Id: <issues.carbondata.apache.org>
Reply-To: dev@carbondata.apache.org
Delivered-To: mailing list issues@carbondata.apache.org
Received: (qmail 52496 invoked by uid 99); 1 Feb 2018 07:37:03 -0000
Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142)
    by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Feb 2018 07:37:03 +0000
Received: from localhost (localhost [127.0.0.1])
	by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 677D0DEA2D
	for <issues@carbondata.apache.org>; Thu,  1 Feb 2018 07:37:03 +0000 (UTC)
X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org
X-Spam-Flag: NO
X-Spam-Score: -110.311
X-Spam-Level:
X-Spam-Status: No, score=-110.311 tagged_above=-999 required=6.31
	tests=[ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_MED=-2.3,
	SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5,
	USER_IN_WHITELIST=-100] autolearn=disabled
Received: from mx1-lw-us.apache.org ([10.40.0.8])
	by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024)
	with ESMTP id 7NqITH-M63d7 for <issues@carbondata.apache.org>;
	Thu,  1 Feb 2018 07:37:02 +0000 (UTC)
Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139])
	by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 625775F473
	for <issues@carbondata.apache.org>; Thu,  1 Feb 2018 07:37:02 +0000 (UTC)
Received: from jira-lw-us.apache.org (unknown [207.244.88.139])
	by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 4D2D5E0254
	for <issues@carbondata.apache.org>; Thu,  1 Feb 2018 07:37:01 +0000 (UTC)
Received: from jira-lw-us.apache.org (localhost [127.0.0.1])
	by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 205C121E84
	for <issues@carbondata.apache.org>; Thu,  1 Feb 2018 07:37:00 +0000 (UTC)
Date: Thu, 1 Feb 2018 07:37:00 +0000 (UTC)
From: "Sangeeta Gulia (JIRA)" <jira@apache.org>
To: issues@carbondata.apache.org
Message-ID: <JIRA.13135334.1517470601000.94470.1517470620131@Atlassian.JIRA>
In-Reply-To: <JIRA.13135334.1517470601000@Atlassian.JIRA>
References: <JIRA.13135334.1517470601000@Atlassian.JIRA> <JIRA.13135334.1517470601077@jira-lw-us.apache.org>
Subject: [jira] [Created] (CARBONDATA-2112) Data getting garbled after
 datamap creation when table is created with GLOBAL SORT
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394

Sangeeta Gulia created CARBONDATA-2112:
------------------------------------------

             Summary: Data getting garbled after datamap creation when tabl=
e is created with GLOBAL SORT
                 Key: CARBONDATA-2112
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-2112
             Project: CarbonData
          Issue Type: Bug
          Components: data-query
         Environment: spark-2.1
            Reporter: Sangeeta Gulia
         Attachments: 2000_UniqData.csv

Data is getting garbled after datamap creation when table is created with B=
ATCH_SORT/GLOBAL_SORT.

=C2=A0

Steps to reproduce :

spark.sql("drop table if exists uniqdata_batchsort_compact3")

spark.sql("CREATE TABLE uniqdata_batchsort_compact3 (CUST_ID int,CUST_NAME =
String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COL=
UMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_C=
OLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_=
COLUMN1 int) STORED BY 'carbondata' TBLPROPERTIES('SORT_SCOPE'=3D'GLOBAL_SO=
RT')").show()

spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into=
 table " +
 "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=3D',' , 'QUOTECHAR'=3D'\"=
'," +
 "'BAD_RECORDS_ACTION'=3D'FORCE','FILEHEADER'=3D'CUST_ID,CUST_NAME,ACTIVE_E=
MUI_VERSION," +
 "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," +
 "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'=3D'=
1')")

spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into=
 table " +
 "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=3D',' , 'QUOTECHAR'=3D'\"=
'," +
 "'BAD_RECORDS_ACTION'=3D'FORCE','FILEHEADER'=3D'CUST_ID,CUST_NAME,ACTIVE_E=
MUI_VERSION," +
 "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," +
 "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'=3D'=
1')")

spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into=
 table " +
 "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=3D',' , 'QUOTECHAR'=3D'\"=
'," +
 "'BAD_RECORDS_ACTION'=3D'FORCE','FILEHEADER'=3D'CUST_ID,CUST_NAME,ACTIVE_E=
MUI_VERSION," +
 "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," +
 "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'=3D'=
1')")

spark.sql("select cust_id, avg(cust_id) from uniqdata_batchsort_compact3 gr=
oup by cust_id ").show(50)

+-------+------------+
|cust_id|avg(cust_id)|
+-------+------------+
| 9376| 9376.0|
| 9427| 9427.0|
| 9465| 9465.0|
| 9852| 9852.0|
| 9900| 9900.0|
| 10206| 10206.0|
| 10362| 10362.0|
| 10623| 10623.0|
| 10817| 10817.0|
| 9182| 9182.0|
| 9564| 9564.0|
| 9879| 9879.0|
| 10081| 10081.0|
| 10121| 10121.0|
| 10230| 10230.0|
| 10462| 10462.0|
| 10703| 10703.0|
| 10914| 10914.0|
| 9162| 9162.0|
| 9383| 9383.0|
| 9454| 9454.0|
| 9517| 9517.0|
| 9558| 9558.0|
| 10708| 10708.0|
| 10798| 10798.0|
| 10862| 10862.0|
| 9071| 9071.0|
| 9169| 9169.0|
| 9946| 9946.0|
| 10468| 10468.0|
| 10745| 10745.0|
| 10768| 10768.0|
| 9153| 9153.0|
| 9206| 9206.0|
| 9403| 9403.0|
| 9597| 9597.0|
| 9647| 9647.0|
| 9775| 9775.0|
| 10032| 10032.0|
| 10395| 10395.0|
| 10527| 10527.0|
| 10567| 10567.0|
| 10632| 10632.0|
| 10788| 10788.0|
| 10815| 10815.0|
| 10840| 10840.0|
| 9181| 9181.0|
| 9344| 9344.0|
| 9575| 9575.0|
| 9675| 9675.0|
+-------+------------+
only showing top 50 rows

Note: Here the cust_id is coming correct .


spark.sql("create datamap uniqdata_agg on table uniqdata_batchsort_compact3=
 using " +
 "'preaggregate' as select avg(cust_id) from uniqdata_batchsort_compact3 gr=
oup by cust_id")

spark.sql("select cust_id, avg(cust_id) from uniqdata_batchsort_compact3 gr=
oup by cust_id ").show(50)

+-------+------------+
|cust_id|avg(cust_id)|
+-------+------------+
| 27651| 9217.0|
| 31944| 10648.0|
| 32667| 10889.0|
| 28242| 9414.0|
| 29841| 9947.0|
| 28728| 9576.0|
| 27255| 9085.0|
| 32571| 10857.0|
| 30276| 10092.0|
| 27276| 9092.0|
| 31503| 10501.0|
| 27687| 9229.0|
| 27183| 9061.0|
| 29334| 9778.0|
| 29913| 9971.0|
| 28683| 9561.0|
| 31545| 10515.0|
| 30405| 10135.0|
| 27693| 9231.0|
| 29649| 9883.0|
| 30537| 10179.0|
| 32709| 10903.0|
| 29586| 9862.0|
| 32895| 10965.0|
| 32415| 10805.0|
| 31644| 10548.0|
| 30030| 10010.0|
| 31713| 10571.0|
| 28083| 9361.0|
| 27813| 9271.0|
| 27171| 9057.0|
| 27189| 9063.0|
| 30444| 10148.0|
| 28623| 9541.0|
| 28566| 9522.0|
| 32655| 10885.0|
| 31164| 10388.0|
| 30321| 10107.0|
| 31452| 10484.0|
| 29829| 9943.0|
| 27468| 9156.0|
| 31212| 10404.0|
| 32154| 10718.0|
| 27531| 9177.0|
| 27654| 9218.0|
| 27105| 9035.0|
| 31113| 10371.0|
| 28479| 9493.0|
| 29094| 9698.0|
| 31551| 10517.0|
+-------+------------+
only showing top 50 rows

Note: But after datamap creation, cust_id is coming incorrect. It is coming=
 as thrice(equivalent to number of loads) of its original value and avg(cus=
t_id) is correct.


--
This message was sent by Atlassian JIRA
(v7.6.3#76005)