From issues-return-33708-archive-asf-public=cust-asf.ponee.io@carbondata.apache.org Thu Feb 1 08:37:05 2018 Return-Path: X-Original-To: archive-asf-public@eu.ponee.io Delivered-To: archive-asf-public@eu.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by mx-eu-01.ponee.io (Postfix) with ESMTP id 922D2180652 for ; Thu, 1 Feb 2018 08:37:05 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 816FB160C56; Thu, 1 Feb 2018 07:37:05 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id A723D160C26 for ; Thu, 1 Feb 2018 08:37:04 +0100 (CET) Received: (qmail 52505 invoked by uid 500); 1 Feb 2018 07:37:03 -0000 Mailing-List: contact issues-help@carbondata.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.apache.org Delivered-To: mailing list issues@carbondata.apache.org Received: (qmail 52496 invoked by uid 99); 1 Feb 2018 07:37:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Feb 2018 07:37:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 677D0DEA2D for ; Thu, 1 Feb 2018 07:37:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -110.311 X-Spam-Level: X-Spam-Status: No, score=-110.311 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 7NqITH-M63d7 for ; Thu, 1 Feb 2018 07:37:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 625775F473 for ; Thu, 1 Feb 2018 07:37:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 4D2D5E0254 for ; Thu, 1 Feb 2018 07:37:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 205C121E84 for ; Thu, 1 Feb 2018 07:37:00 +0000 (UTC) Date: Thu, 1 Feb 2018 07:37:00 +0000 (UTC) From: "Sangeeta Gulia (JIRA)" To: issues@carbondata.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (CARBONDATA-2112) Data getting garbled after datamap creation when table is created with GLOBAL SORT MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 Sangeeta Gulia created CARBONDATA-2112: ------------------------------------------ Summary: Data getting garbled after datamap creation when tabl= e is created with GLOBAL SORT Key: CARBONDATA-2112 URL: https://issues.apache.org/jira/browse/CARBONDATA-2112 Project: CarbonData Issue Type: Bug Components: data-query Environment: spark-2.1 Reporter: Sangeeta Gulia Attachments: 2000_UniqData.csv Data is getting garbled after datamap creation when table is created with B= ATCH_SORT/GLOBAL_SORT. =C2=A0 Steps to reproduce : spark.sql("drop table if exists uniqdata_batchsort_compact3") spark.sql("CREATE TABLE uniqdata_batchsort_compact3 (CUST_ID int,CUST_NAME = String,ACTIVE_EMUI_VERSION string, DOB timestamp, DOJ timestamp, BIGINT_COL= UMN1 bigint,BIGINT_COLUMN2 bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_C= OLUMN2 decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_= COLUMN1 int) STORED BY 'carbondata' TBLPROPERTIES('SORT_SCOPE'=3D'GLOBAL_SO= RT')").show() spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into= table " + "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=3D',' , 'QUOTECHAR'=3D'\"= '," + "'BAD_RECORDS_ACTION'=3D'FORCE','FILEHEADER'=3D'CUST_ID,CUST_NAME,ACTIVE_E= MUI_VERSION," + "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," + "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'=3D'= 1')") spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into= table " + "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=3D',' , 'QUOTECHAR'=3D'\"= '," + "'BAD_RECORDS_ACTION'=3D'FORCE','FILEHEADER'=3D'CUST_ID,CUST_NAME,ACTIVE_E= MUI_VERSION," + "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," + "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'=3D'= 1')") spark.sql("LOAD DATA INPATH '/home/sangeeta/Desktop/2000_UniqData.csv' into= table " + "uniqdata_batchsort_compact3 OPTIONS('DELIMITER'=3D',' , 'QUOTECHAR'=3D'\"= '," + "'BAD_RECORDS_ACTION'=3D'FORCE','FILEHEADER'=3D'CUST_ID,CUST_NAME,ACTIVE_E= MUI_VERSION," + "DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2," + "Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1','batch_sort_size_inmb'=3D'= 1')") spark.sql("select cust_id, avg(cust_id) from uniqdata_batchsort_compact3 gr= oup by cust_id ").show(50) +-------+------------+ |cust_id|avg(cust_id)| +-------+------------+ | 9376| 9376.0| | 9427| 9427.0| | 9465| 9465.0| | 9852| 9852.0| | 9900| 9900.0| | 10206| 10206.0| | 10362| 10362.0| | 10623| 10623.0| | 10817| 10817.0| | 9182| 9182.0| | 9564| 9564.0| | 9879| 9879.0| | 10081| 10081.0| | 10121| 10121.0| | 10230| 10230.0| | 10462| 10462.0| | 10703| 10703.0| | 10914| 10914.0| | 9162| 9162.0| | 9383| 9383.0| | 9454| 9454.0| | 9517| 9517.0| | 9558| 9558.0| | 10708| 10708.0| | 10798| 10798.0| | 10862| 10862.0| | 9071| 9071.0| | 9169| 9169.0| | 9946| 9946.0| | 10468| 10468.0| | 10745| 10745.0| | 10768| 10768.0| | 9153| 9153.0| | 9206| 9206.0| | 9403| 9403.0| | 9597| 9597.0| | 9647| 9647.0| | 9775| 9775.0| | 10032| 10032.0| | 10395| 10395.0| | 10527| 10527.0| | 10567| 10567.0| | 10632| 10632.0| | 10788| 10788.0| | 10815| 10815.0| | 10840| 10840.0| | 9181| 9181.0| | 9344| 9344.0| | 9575| 9575.0| | 9675| 9675.0| +-------+------------+ only showing top 50 rows Note: Here the cust_id is coming correct . spark.sql("create datamap uniqdata_agg on table uniqdata_batchsort_compact3= using " + "'preaggregate' as select avg(cust_id) from uniqdata_batchsort_compact3 gr= oup by cust_id") spark.sql("select cust_id, avg(cust_id) from uniqdata_batchsort_compact3 gr= oup by cust_id ").show(50) +-------+------------+ |cust_id|avg(cust_id)| +-------+------------+ | 27651| 9217.0| | 31944| 10648.0| | 32667| 10889.0| | 28242| 9414.0| | 29841| 9947.0| | 28728| 9576.0| | 27255| 9085.0| | 32571| 10857.0| | 30276| 10092.0| | 27276| 9092.0| | 31503| 10501.0| | 27687| 9229.0| | 27183| 9061.0| | 29334| 9778.0| | 29913| 9971.0| | 28683| 9561.0| | 31545| 10515.0| | 30405| 10135.0| | 27693| 9231.0| | 29649| 9883.0| | 30537| 10179.0| | 32709| 10903.0| | 29586| 9862.0| | 32895| 10965.0| | 32415| 10805.0| | 31644| 10548.0| | 30030| 10010.0| | 31713| 10571.0| | 28083| 9361.0| | 27813| 9271.0| | 27171| 9057.0| | 27189| 9063.0| | 30444| 10148.0| | 28623| 9541.0| | 28566| 9522.0| | 32655| 10885.0| | 31164| 10388.0| | 30321| 10107.0| | 31452| 10484.0| | 29829| 9943.0| | 27468| 9156.0| | 31212| 10404.0| | 32154| 10718.0| | 27531| 9177.0| | 27654| 9218.0| | 27105| 9035.0| | 31113| 10371.0| | 28479| 9493.0| | 29094| 9698.0| | 31551| 10517.0| +-------+------------+ only showing top 50 rows Note: But after datamap creation, cust_id is coming incorrect. It is coming= as thrice(equivalent to number of loads) of its original value and avg(cus= t_id) is correct. -- This message was sent by Atlassian JIRA (v7.6.3#76005)