carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacky Li (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (CARBONDATA-786) Data mismatch if the data data is loaded across blocklet groups
Date Thu, 16 Mar 2017 15:15:41 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jacky Li resolved CARBONDATA-786.
---------------------------------
       Resolution: Fixed
         Assignee: Ravindra Pesala
    Fix Version/s: 1.1.0-incubating

> Data mismatch if the data data is loaded across blocklet groups
> ---------------------------------------------------------------
>
>                 Key: CARBONDATA-786
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-786
>             Project: CarbonData
>          Issue Type: Bug
>            Reporter: Ravindra Pesala
>            Assignee: Ravindra Pesala
>             Fix For: 1.1.0-incubating
>
>          Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> Data mismatch if the data data is loaded across blocklet groups and filter applied on
second column onwards.
> Follow testcase
> {code} 
> CarbonProperties.getInstance()
>       .addProperty("carbon.blockletgroup.size.in.mb", "16")
>       .addProperty("carbon.enable.vector.reader", "true")
>       .addProperty("enable.unsafe.sort", "true")
>     val rdd = sqlContext.sparkContext
>       .parallelize(1 to 1200000, 4)
>       .map { x =>
>         ("city" + x % 8, "country" + x % 1103, "planet" + x % 10007, x.toString,
>           (x % 16).toShort, x / 2, (x << 1).toLong, x.toDouble / 13, x.toDouble
/ 11)
>       }.map { x =>
>       Row(x._1, x._2, x._3, x._4, x._5, x._6, x._7, x._8, x._9)
>     }
>     val schema = StructType(
>       Seq(
>         StructField("city", StringType, nullable = false),
>         StructField("country", StringType, nullable = false),
>         StructField("planet", StringType, nullable = false),
>         StructField("id", StringType, nullable = false),
>         StructField("m1", ShortType, nullable = false),
>         StructField("m2", IntegerType, nullable = false),
>         StructField("m3", LongType, nullable = false),
>         StructField("m4", DoubleType, nullable = false),
>         StructField("m5", DoubleType, nullable = false)
>       )
>     )
>     val input = sqlContext.createDataFrame(rdd, schema)
>     sql(s"drop table if exists testBigData")
>     input.write
>       .format("carbondata")
>       .option("tableName", "testBigData")
>       .option("tempCSV", "false")
>       .option("single_pass", "true")
>       .option("dictionary_exclude", "id") // id is high cardinality column
>       .mode(SaveMode.Overwrite)
>       .save()
> sql(s"select city, sum(m1) from testBigData " +
>           s"where country='country12' group by city order by city").show()
> {code}
> The above code supposed return following data, but not returning it.
> {code}
> +-----+-------+
> | city|sum(m1)|
> +-----+-------+
> |city0|    544|
> |city1|    680|
> |city2|    816|
> |city3|    952|
> |city4|   1088|
> |city5|   1224|
> |city6|   1360|
> |city7|   1496|
> +-----+-------+
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message