carbondata-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ravipes...@apache.org
Subject [24/25] carbondata git commit: [CARBONDATA-2098] Optimize pre-aggregate documentation
Date Sat, 03 Mar 2018 12:44:11 GMT
[CARBONDATA-2098] Optimize pre-aggregate documentation

optimize pre-aggregate documentation
move to separate file
add more examples

This closes #2022


Project: http://git-wip-us.apache.org/repos/asf/carbondata/repo
Commit: http://git-wip-us.apache.org/repos/asf/carbondata/commit/ec893412
Tree: http://git-wip-us.apache.org/repos/asf/carbondata/tree/ec893412
Diff: http://git-wip-us.apache.org/repos/asf/carbondata/diff/ec893412

Branch: refs/heads/branch-1.3
Commit: ec893412bc40b72a642a0173f5f5b3fdcbba0877
Parents: a816e0c
Author: sraghunandan <carbondatacontributions@gmail.com>
Authored: Fri Mar 2 17:02:39 2018 +0530
Committer: ravipesala <ravi.pesala@gmail.com>
Committed: Sat Mar 3 18:11:51 2018 +0530

----------------------------------------------------------------------
 docs/data-management-on-carbondata.md           | 242 --------------
 docs/preaggregate-guide.md                      | 313 +++++++++++++++++++
 .../examples/PreAggregateTableExample.scala     |  50 ++-
 3 files changed, 362 insertions(+), 243 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/carbondata/blob/ec893412/docs/data-management-on-carbondata.md
----------------------------------------------------------------------
diff --git a/docs/data-management-on-carbondata.md b/docs/data-management-on-carbondata.md
index ea80d41..2aa4a49 100644
--- a/docs/data-management-on-carbondata.md
+++ b/docs/data-management-on-carbondata.md
@@ -26,7 +26,6 @@ This tutorial is going to introduce all commands and data operations on
CarbonDa
 * [UPDATE AND DELETE](#update-and-delete)
 * [COMPACTION](#compaction)
 * [PARTITION](#partition)
-* [PRE-AGGREGATE TABLES](#pre-aggregate-tables)
 * [BUCKETING](#bucketing)
 * [SEGMENT MANAGEMENT](#segment-management)
 
@@ -859,247 +858,6 @@ This tutorial is going to introduce all commands and data operations
on CarbonDa
   * The partitioned column can be excluded from SORT_COLUMNS, this will let other columns
to do the efficient sorting.
   * When writing SQL on a partition table, try to use filters on the partition column.
 
-
-## PRE-AGGREGATE TABLES
-  CarbonData supports pre aggregating of data so that OLAP kind of queries can fetch data

-  much faster. Aggregate tables are created as datamaps so that the handling is as efficient
as 
-  other indexing support. Users can create as many aggregate tables they require as datamaps
to 
-  improve their query performance, provided the storage requirements and loading speeds are

-  acceptable.
-  
-  For main table called **sales** which is defined as 
-  
-  ```
-  CREATE TABLE sales (
-            order_time timestamp,
-            user_id STRING,
-            sex STRING,
-            country STRING,
-            quantity INT,
-            price BIGINT)
-  STORED BY 'carbondata'
-  ```
-  
-  user can create pre-aggregate tables using the DDL
-  
-  ```
-  CREATE DATAMAP agg_sales
-  ON TABLE sales
-  USING "preaggregate"
-  AS
-  SELECT country, sex, sum(quantity), avg(price)
-  FROM sales
-  GROUP BY country, sex
-  ```
-  
-<b><p align="left">Functions supported in pre-aggregate tables</p></b>
-
-| Function | Rollup supported |
-|-----------|----------------|
-| SUM | Yes |
-| AVG | Yes |
-| MAX | Yes |
-| MIN | Yes |
-| COUNT | Yes |
-
-
-##### How pre-aggregate tables are selected
-For the main table **sales** and pre-aggregate table **agg_sales** created above, queries
of the 
-kind
-```
-SELECT country, sex, sum(quantity), avg(price) from sales GROUP BY country, sex
-
-SELECT sex, sum(quantity) from sales GROUP BY sex
-
-SELECT sum(price), country from sales GROUP BY country
-``` 
-
-will be transformed by Query Planner to fetch data from pre-aggregate table **agg_sales**
-
-But queries of kind
-```
-SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY user_id, country,
sex
-
-SELECT sex, avg(quantity) from sales GROUP BY sex
-
-SELECT country, max(price) from sales GROUP BY country
-```
-
-will fetch the data from the main table **sales**
-
-##### Loading data to pre-aggregate tables
-For existing table with loaded data, data load to pre-aggregate table will be triggered by
the 
-CREATE DATAMAP statement when user creates the pre-aggregate table.
-For incremental loads after aggregates tables are created, loading data to main table triggers

-the load to pre-aggregate tables once main table loading is complete. These loads are automic

-meaning that data on main table and aggregate tables are only visible to the user after all
tables 
-are loaded
-
-##### Querying data from pre-aggregate tables
-Pre-aggregate tables cannot be queries directly. Queries are to be made on main table. Internally

-carbondata will check associated pre-aggregate tables with the main table, and if the 
-pre-aggregate tables satisfy the query condition, the plan is transformed automatically to
use 
-pre-aggregate table to fetch the data.
-
-##### Compacting pre-aggregate tables
-Compaction command (ALTER TABLE COMPACT) need to be run separately on each pre-aggregate
table.
-Running Compaction command on main table will **not automatically** compact the pre-aggregate

-tables. Compaction is an optional operation for pre-aggregate table. If compaction is performed
on
-main table but not performed on pre-aggregate table, all queries still can benefit from 
-pre-aggregate tables. To further improve performance on pre-aggregate tables, compaction
can be 
-triggered on pre-aggregate tables directly, it will merge the segments inside pre-aggregate
table. 
-
-##### Update/Delete Operations on pre-aggregate tables
-This functionality is not supported.
-
-  NOTE (<b>RESTRICTION</b>):
-  Update/Delete operations are <b>not supported</b> on main table which has pre-aggregate
tables 
-  created on it. All the pre-aggregate tables <b>will have to be dropped</b>
before update/delete 
-  operations can be performed on the main table. Pre-aggregate tables can be rebuilt manually

-  after update/delete operations are completed
- 
-##### Delete Segment Operations on pre-aggregate tables
-This functionality is not supported.
-
-  NOTE (<b>RESTRICTION</b>):
-  Delete Segment operations are <b>not supported</b> on main table which has
pre-aggregate tables 
-  created on it. All the pre-aggregate tables <b>will have to be dropped</b>
before delete segment 
-  operations can be performed on the main table. Pre-aggregate tables can be rebuilt manually

-  after delete segment operations are completed
-  
-##### Alter Table Operations on pre-aggregate tables
-This functionality is not supported.
-
-  NOTE (<b>RESTRICTION</b>):
-  Adding new column in new table does not have any affect on pre-aggregate tables. However
if 
-  dropping or renaming a column has impact in pre-aggregate table, such operations will be

-  rejected and error will be thrown. All the pre-aggregate tables <b>will have to be
dropped</b> 
-  before alter operations can be performed on the main table. Pre-aggregate tables can be
rebuilt 
-  manually after alter table operations are completed
-  
-### Supporting timeseries data (Alpha feature in 1.3.0)
-CarbonData has built-in understanding of time hierarchy and levels: year, month, day, hour,
minute, second.
-Timeseries pre-aggregate tables can be created for the hierarchy and CarbonData can do automatic

-roll-up for the queries on these hierarchies.
-
-  ```
-  CREATE DATAMAP agg_year
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_time'='order_time',
-  'year_granualrity'='1',
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
-    
-  CREATE DATAMAP agg_month
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_time'='order_time',
-  'month_granualrity'='1',
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
-    
-  CREATE DATAMAP agg_day
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_time'='order_time',
-  'day_granualrity'='1',
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
-        
-  CREATE DATAMAP agg_sales_hour
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_time'='order_time',
-  'hour_granualrity'='1',
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
-  
-  CREATE DATAMAP agg_minute
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_time'='order_time',
-  'minute_granualrity'='1',
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
-    
-  CREATE DATAMAP agg_minute
-  ON TABLE sales
-  USING "timeseries"
-  DMPROPERTIES (
-  'event_time'='order_time',
-  'minute_granualrity'='1',
-  ) AS
-  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
-   avg(price) FROM sales GROUP BY order_time, country, sex
-  ```
-  For Querying data and automatically roll-up to the desired aggregation level, CarbonData
supports 
-  UDF as
-  ```
-  timeseries(timeseries column name, 'aggregation level')
-  ```
-  Examples
-  ```
-  SELECT 
-        timeseries(order_time, 'hour'), 
-        sum(quantity) 
-  FROM sales 
-  GROUP BY timeseries(order_time, 'hour')
-  ```
-  
-  It is **not necessary** to create pre-aggregate tables for each granularity unless required
for 
-  query. CarbonData can roll-up the data and fetch it.
-   
-  For Example: For main table **sales**, If timeseries pre-aggregate tables were created
as  
-  
-  ```
-  CREATE DATAMAP agg_day
-    ON TABLE sales
-    USING "timeseries"
-    DMPROPERTIES (
-    'event_time'='order_time',
-    'day_granualrity'='1',
-    ) AS
-    SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
-     avg(price) FROM sales GROUP BY order_time, country, sex
-          
-    CREATE DATAMAP agg_sales_hour
-    ON TABLE sales
-    USING "timeseries"
-    DMPROPERTIES (
-    'event_time'='order_time',
-    'hour_granualrity'='1',
-    ) AS
-    SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
-     avg(price) FROM sales GROUP BY order_time, country, sex
-  ```
-  
-  Queries like below will be rolled-up and fetched from pre-aggregate tables
-  ```
-  Select timeseries(order_time, 'month'), sum(quantity) from sales group by timeseries(order_time,
-    'month')
-    
-  Select timeseries(order_time, 'year'), sum(quantity) from sales group by timeseries(order_time,
-    'year')
-  ```
-  
-  NOTE (<b>RESTRICTION</b>):
-  * Only 1 is supported for granularity value of timeseries pre-aggregate table. Other granularity
value are not supported.
-  * Only one granularity can be defined on creating one timeseries pre-aggregate table. Other
granularity are created separately.
-  * Pre-aggregate tables for the desired levels needs to be created one after the other
-  * Pre-aggregate tables are created for each level needs to be dropped separately 
-    
-
 ## BUCKETING
 
   Bucketing feature can be used to distribute/organize the table/partition data into multiple
files such

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ec893412/docs/preaggregate-guide.md
----------------------------------------------------------------------
diff --git a/docs/preaggregate-guide.md b/docs/preaggregate-guide.md
new file mode 100644
index 0000000..411433a
--- /dev/null
+++ b/docs/preaggregate-guide.md
@@ -0,0 +1,313 @@
+# CarbonData Pre-aggregate tables
+  
+## Quick example
+Download and unzip spark-2.2.0-bin-hadoop2.7.tgz, and export $SPARK_HOME
+
+Package carbon jar, and copy assembly/target/scala-2.11/carbondata_2.11-x.x.x-SNAPSHOT-shade-hadoop2.7.2.jar
to $SPARK_HOME/jars
+```shell
+mvn clean package -DskipTests -Pspark-2.2
+```
+
+Start spark-shell in new terminal, type :paste, then copy and run the following code.
+```scala
+ import java.io.File
+ import org.apache.spark.sql.{CarbonEnv, SparkSession}
+ import org.apache.spark.sql.CarbonSession._
+ import org.apache.spark.sql.streaming.{ProcessingTime, StreamingQuery}
+ import org.apache.carbondata.core.util.path.CarbonStorePath
+ 
+ val warehouse = new File("./warehouse").getCanonicalPath
+ val metastore = new File("./metastore").getCanonicalPath
+ 
+ val spark = SparkSession
+   .builder()
+   .master("local")
+   .appName("preAggregateExample")
+   .config("spark.sql.warehouse.dir", warehouse)
+   .getOrCreateCarbonSession(warehouse, metastore)
+
+ spark.sparkContext.setLogLevel("ERROR")
+
+ // drop table if exists previously
+ spark.sql(s"DROP TABLE IF EXISTS sales")
+ // Create target carbon table and populate with initial data
+ spark.sql(
+   s"""
+      | CREATE TABLE sales (
+      | user_id string,
+      | country string,
+      | quantity int,
+      | price bigint)
+      | STORED BY 'carbondata'""".stripMargin)
+      
+ spark.sql(
+   s"""
+      | CREATE DATAMAP agg_sales
+      | ON TABLE sales
+      | USING "preaggregate"
+      | AS
+      | SELECT country, sum(quantity), avg(price)
+      | FROM sales
+      | GROUP BY country""".stripMargin)
+      
+ import spark.implicits._
+ import org.apache.spark.sql.SaveMode
+ import scala.util.Random
+ 
+ val r = new Random()
+ val df = spark.sparkContext.parallelize(1 to 10)
+   .map(x => ("ID." + r.nextInt(100000), "country" + x % 8, x % 50, x % 60))
+   .toDF("user_id", "country", "quantity", "price")
+
+ // Create table with pre-aggregate table
+ df.write.format("carbondata")
+   .option("tableName", "sales")
+   .option("compress", "true")
+   .mode(SaveMode.Append).save()
+      
+ spark.sql(
+      s"""
+    |SELECT country, sum(quantity), avg(price)
+    | from sales GROUP BY country""".stripMargin).show
+
+ spark.stop
+```
+
+##PRE-AGGREGATE TABLES  
+  Carbondata supports pre aggregating of data so that OLAP kind of queries can fetch data

+  much faster.Aggregate tables are created as datamaps so that the handling is as efficient
as 
+  other indexing support.Users can create as many aggregate tables they require as datamaps
to 
+  improve their query performance,provided the storage requirements and loading speeds are

+  acceptable.
+  
+  For main table called **sales** which is defined as 
+  
+  ```
+  CREATE TABLE sales (
+  order_time timestamp,
+  user_id string,
+  sex string,
+  country string,
+  quantity int,
+  price bigint)
+  STORED BY 'carbondata'
+  ```
+  
+  user can create pre-aggregate tables using the DDL
+  
+  ```
+  CREATE DATAMAP agg_sales
+  ON TABLE sales
+  USING "preaggregate"
+  AS
+  SELECT country, sex, sum(quantity), avg(price)
+  FROM sales
+  GROUP BY country, sex
+  ```
+  
+
+  
+<b><p align="left">Functions supported in pre-aggregate tables</p></b>
+
+| Function | Rollup supported |
+|-----------|----------------|
+| SUM | Yes |
+| AVG | Yes |
+| MAX | Yes |
+| MIN | Yes |
+| COUNT | Yes |
+
+
+##### How pre-aggregate tables are selected
+For the main table **sales** and pre-aggregate table **agg_sales** created above, queries
of the 
+kind
+```
+SELECT country, sex, sum(quantity), avg(price) from sales GROUP BY country, sex
+
+SELECT sex, sum(quantity) from sales GROUP BY sex
+
+SELECT sum(price), country from sales GROUP BY country
+``` 
+
+will be transformed by Query Planner to fetch data from pre-aggregate table **agg_sales**
+
+But queries of kind
+```
+SELECT user_id, country, sex, sum(quantity), avg(price) from sales GROUP BY user_id, country,
sex
+
+SELECT sex, avg(quantity) from sales GROUP BY sex
+
+SELECT country, max(price) from sales GROUP BY country
+```
+
+will fetch the data from the main table **sales**
+
+##### Loading data to pre-aggregate tables
+For existing table with loaded data, data load to pre-aggregate table will be triggered by
the 
+CREATE DATAMAP statement when user creates the pre-aggregate table.
+For incremental loads after aggregates tables are created, loading data to main table triggers

+the load to pre-aggregate tables once main table loading is complete. These loads are automic

+meaning that data on main table and aggregate tables are only visible to the user after all
tables 
+are loaded
+
+##### Querying data from pre-aggregate tables
+Pre-aggregate tables cannot be queries directly. Queries are to be made on main table. Internally

+carbondata will check associated pre-aggregate tables with the main table, and if the 
+pre-aggregate tables satisfy the query condition, the plan is transformed automatically to
use 
+pre-aggregate table to fetch the data.
+
+##### Compacting pre-aggregate tables
+Compaction command (ALTER TABLE COMPACT) need to be run separately on each pre-aggregate
table.
+Running Compaction command on main table will **not automatically** compact the pre-aggregate

+tables.Compaction is an optional operation for pre-aggregate table. If compaction is performed
on
+main table but not performed on pre-aggregate table, all queries still can benefit from 
+pre-aggregate tables. To further improve performance on pre-aggregate tables, compaction
can be 
+triggered on pre-aggregate tables directly, it will merge the segments inside pre-aggregate
table. 
+
+##### Update/Delete Operations on pre-aggregate tables
+This functionality is not supported.
+
+  NOTE (<b>RESTRICTION</b>):
+  Update/Delete operations are <b>not supported</b> on main table which has pre-aggregate
tables 
+  created on it. All the pre-aggregate tables <b>will have to be dropped</b>
before update/delete 
+  operations can be performed on the main table. Pre-aggregate tables can be rebuilt manually

+  after update/delete operations are completed
+ 
+##### Delete Segment Operations on pre-aggregate tables
+This functionality is not supported.
+
+  NOTE (<b>RESTRICTION</b>):
+  Delete Segment operations are <b>not supported</b> on main table which has
pre-aggregate tables 
+  created on it. All the pre-aggregate tables <b>will have to be dropped</b>
before update/delete 
+  operations can be performed on the main table. Pre-aggregate tables can be rebuilt manually

+  after delete segment operations are completed
+  
+##### Alter Table Operations on pre-aggregate tables
+This functionality is not supported.
+
+  NOTE (<b>RESTRICTION</b>):
+  Adding new column in new table does not have any affect on pre-aggregate tables. However
if 
+  dropping or renaming a column has impact in pre-aggregate table, such operations will be

+  rejected and error will be thrown. All the pre-aggregate tables <b>will have to be
dropped</b> 
+  before Alter Operations can be performed on the main table. Pre-aggregate tables can be
rebuilt 
+  manually after Alter Table operations are completed
+  
+### Supporting timeseries data (Alpha feature in 1.3.0)
+Carbondata has built-in understanding of time hierarchy and levels: year, month, day, hour,
minute.
+Multiple pre-aggregate tables can be created for the hierarchy and Carbondata can do automatic

+roll-up for the queries on these hierarchies.
+
+  ```
+  CREATE DATAMAP agg_year
+  ON TABLE sales
+  USING "timeseries"
+  DMPROPERTIES (
+  'event_time'='order_time',
+  'year_granualrity'='1',
+  ) AS
+  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
+   avg(price) FROM sales GROUP BY order_time, country, sex
+    
+  CREATE DATAMAP agg_month
+  ON TABLE sales
+  USING "timeseries"
+  DMPROPERTIES (
+  'event_time'='order_time',
+  'month_granualrity'='1',
+  ) AS
+  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
+   avg(price) FROM sales GROUP BY order_time, country, sex
+    
+  CREATE DATAMAP agg_day
+  ON TABLE sales
+  USING "timeseries"
+  DMPROPERTIES (
+  'event_time'='order_time',
+  'day_granualrity'='1',
+  ) AS
+  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
+   avg(price) FROM sales GROUP BY order_time, country, sex
+        
+  CREATE DATAMAP agg_sales_hour
+  ON TABLE sales
+  USING "timeseries"
+  DMPROPERTIES (
+  'event_time'='order_time',
+  'hour_granualrity'='1',
+  ) AS
+  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
+   avg(price) FROM sales GROUP BY order_time, country, sex
+  
+  CREATE DATAMAP agg_minute
+  ON TABLE sales
+  USING "timeseries"
+  DMPROPERTIES (
+  'event_time'='order_time',
+  'minute_granualrity'='1',
+  ) AS
+  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
+   avg(price) FROM sales GROUP BY order_time, country, sex
+    
+  CREATE DATAMAP agg_minute
+  ON TABLE sales
+  USING "timeseries"
+  DMPROPERTIES (
+  'event_time'='order_time',
+  'minute_granualrity'='1',
+  ) AS
+  SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
+   avg(price) FROM sales GROUP BY order_time, country, sex
+  ```
+  
+  For Querying data and automatically roll-up to the desired aggregation level,Carbondata
supports 
+  UDF as
+  ```
+  timeseries(timeseries column name, 'aggregation level')
+  ```
+  ```
+  Select timeseries(order_time, 'hour'), sum(quantity) from sales group by timeseries(order_time,
+  'hour')
+  ```
+  
+  It is **not necessary** to create pre-aggregate tables for each granularity unless required
for 
+  query. Carbondata can roll-up the data and fetch it.
+   
+  For Example: For main table **sales** , If pre-aggregate tables were created as  
+  
+  ```
+  CREATE DATAMAP agg_day
+    ON TABLE sales
+    USING "timeseries"
+    DMPROPERTIES (
+    'event_time'='order_time',
+    'day_granualrity'='1',
+    ) AS
+    SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
+     avg(price) FROM sales GROUP BY order_time, country, sex
+          
+    CREATE DATAMAP agg_sales_hour
+    ON TABLE sales
+    USING "timeseries"
+    DMPROPERTIES (
+    'event_time'='order_time',
+    'hour_granualrity'='1',
+    ) AS
+    SELECT order_time, country, sex, sum(quantity), max(quantity), count(user_id), sum(price),
+     avg(price) FROM sales GROUP BY order_time, country, sex
+  ```
+  
+  Queries like below will be rolled-up and fetched from pre-aggregate tables
+  ```
+  Select timeseries(order_time, 'month'), sum(quantity) from sales group by timeseries(order_time,
+    'month')
+    
+  Select timeseries(order_time, 'year'), sum(quantity) from sales group by timeseries(order_time,
+    'year')
+  ```
+  
+  NOTE (<b>RESTRICTION</b>):
+  * Only value of 1 is supported for hierarchy levels. Other hierarchy levels are not supported.

+  Other hierarchy levels are not supported
+  * pre-aggregate tables for the desired levels needs to be created one after the other
+  * pre-aggregate tables created for each level needs to be dropped separately 
+    

http://git-wip-us.apache.org/repos/asf/carbondata/blob/ec893412/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
----------------------------------------------------------------------
diff --git a/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
b/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
index d27eefb..ace3dcc 100644
--- a/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
+++ b/examples/spark2/src/main/scala/org/apache/carbondata/examples/PreAggregateTableExample.scala
@@ -38,6 +38,7 @@ object PreAggregateTableExample {
 
     // 1. simple usage for Pre-aggregate tables creation and query
     spark.sql("DROP TABLE IF EXISTS mainTable")
+    spark.sql("DROP TABLE IF EXISTS mainTable_other")
     spark.sql("""
                 | CREATE TABLE mainTable
                 | (id Int,
@@ -47,10 +48,23 @@ object PreAggregateTableExample {
                 | STORED BY 'org.apache.carbondata.format'
               """.stripMargin)
 
+    spark.sql("""
+                | CREATE TABLE mainTable_other
+                | (id Int,
+                | name String,
+                | city String,
+                | age Int)
+                | STORED BY 'org.apache.carbondata.format'
+              """.stripMargin)
+
     spark.sql(s"""
        LOAD DATA LOCAL INPATH '$testData' into table mainTable
        """)
 
+    spark.sql(s"""
+       LOAD DATA LOCAL INPATH '$testData' into table mainTable_other
+       """)
+
     spark.sql(
       s"""create datamap preagg_sum on table mainTable using 'preaggregate' as
          | select id,sum(age) from mainTable group by id"""
@@ -59,14 +73,17 @@ object PreAggregateTableExample {
       s"""create datamap preagg_avg on table mainTable using 'preaggregate' as
          | select id,avg(age) from mainTable group by id"""
         .stripMargin)
+
     spark.sql(
-      s"""create datamap preagg_count on table mainTable using 'preaggregate' as
+      s"""create datamap preagg_count_age on table mainTable using 'preaggregate' as
          | select id,count(age) from mainTable group by id"""
         .stripMargin)
+
     spark.sql(
       s"""create datamap preagg_min on table mainTable using 'preaggregate' as
          | select id,min(age) from mainTable group by id"""
         .stripMargin)
+
     spark.sql(
       s"""create datamap preagg_max on table mainTable using 'preaggregate' as
          | select id,max(age) from mainTable group by id"""
@@ -74,10 +91,40 @@ object PreAggregateTableExample {
 
     spark.sql(
       s"""
+         | create datamap preagg_case on table mainTable using 'preaggregate' as
+         | select name,sum(case when age=35 then id else 0 end) from mainTable group by name
+         | """.stripMargin)
+
+    spark.sql(
+      s"""create datamap preagg_count on table maintable using 'preaggregate' as
+         | select name, count(*) from maintable group by name""".stripMargin)
+
+    spark.sql(
+      s"""
          | SELECT id,max(age)
          | FROM mainTable group by id
       """.stripMargin).show()
 
+    spark.sql(
+      s"""
+         | select name, count(*) from
+         | mainTable group by name
+      """.stripMargin).show()
+
+    spark.sql(
+      s"""
+         | select name as NewName,
+         | sum(case when age=35 then id else 0 end) as sum
+         | from mainTable group by name order by name
+      """.stripMargin).show()
+
+    spark.sql(
+      s"""
+         | select t1.name,t1.city from mainTable_other t1 join
+         | (select name as newnewname,sum(age) as sum
+         | from mainTable group by name )t2 on t1.name=t2.newnewname
+      """.stripMargin).show()
+
     // 2.compare the performance : with pre-aggregate VS main table
 
     // build test data, if set the data is larger than 100M, it will take 10+ mins.
@@ -160,6 +207,7 @@ object PreAggregateTableExample {
     // scalastyle:on
 
     spark.sql("DROP TABLE IF EXISTS mainTable")
+    spark.sql("DROP TABLE IF EXISTS mainTable_other")
     spark.sql("DROP TABLE IF EXISTS personTable")
     spark.sql("DROP TABLE IF EXISTS personTableWithoutAgg")
 


Mime
View raw message