carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "chenerlu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CARBONDATA-1438) Unify the sort column and sort scope in create table command
Date Thu, 31 Aug 2017 07:56:00 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

chenerlu updated CARBONDATA-1438:
---------------------------------
    Description: 
1	Requirement
Currently, Users can specify sort column in table properties when create table. And when load
data, users can also specify sort scope in load options.
In order to improve the ease of use for users, it will be better to specify the sort related
parameters all in create table command.
Once sort scope is specified in create table command, it will be used in load data even users
have specified in load options.

2	Detailed design
2.1	Task-01
Requirement: Create table can support specify sort scope
Implement: Take use of table properties (Map<String, String>), will specify sort scope
in table properties by key/value pair, then existing interface will be called to write this
key/value pair into metastore.
Will support Global Sort,Local Sort and No Sort,it can be specified in sql command:

CREATE TABLE tableWithGlobalSort (
shortField SHORT,
intField INT,
bigintField LONG,
doubleField DOUBLE,
stringField STRING,
timestampField TIMESTAMP,
decimalField DECIMAL(18,2),
dateField DATE,
charField CHAR(5)
)
STORED BY 'carbondata'
TBLPROPERTIES('SORT_COLUMNS'='stringField', 'SORT_SCOPE'='GLOBAL_SORT')
 
Tips:If the sort scope is global Sort, users should specify GLOBAL_SORT_PARTITIONS. If users
do not specify it, it will use the number of map task. GLOBAL_SORT_PARTITIONS should be Integer
type, the range is [1,Integer.MaxValue],it is only used when the sort scope is global sort.


Global Sort	Use orderby operator in spark, data is ordered in segment level.
Local Sort	Node ordered, carbondata file is ordered if it is written by one task. 
No Sort	No sort

Tips:key and value is case-insensitive.
2.2	Task-02
Requirement:
Load data in will support local sort, no sort, global sort 
Ignore the sort scope specified in load data and use the parameter which specified in create
table.

Currently, user can specify the sort scope and global sort partitions in load options, After
modification, it will ignore the sort scope which specified in load options and will get sort
scope from table properties.

Current logic: sort scope is from load options
Number		Prerequisite	Sort scope
1	isSortTable is true && Sort Scope is Global Sort	Global Sort(first check)
2	isSortTable is false	No Sort
3	isSortTable is true	Local Sort
Tips: isSortTable is true means this table contains sort column or it contains dimensions
(except complex type), like string type.

For example:
Create table xxx1 (col1 string col2 int) stored by ‘carbondata’ --- sort table
Create table xx1 (col1 int, col2 int) stored by ‘carbondata’ --- not sort table
Create table xx (col1 int, col2 string) stored by ‘carbondata’ tblproperties (‘sort_column’=’col1’)
 –- sort table

New logic:sort scope is from create table
Number	Prerequisite	Code branch
1	isSortTable = true && Sort Scope is Global Sort	Global Sort(first check)
2	isSortTable= false || Sort Scope is No Sort	No Sort
3	isSortTable is true && Sort Scope is Local Sort	Local Sort
4	isSortTable is true,without specify Sort Scope	Local Sort, (Keep current logic) 

3	Acceptance standard
Number	Acceptance standard
1	Use can specify sort scope(global, local, no sort) when create carbon table in sql type
2	Load data will ignore the sort scope specified in load options and will use the parameter
which specify in create table command. If user still specify the sort scope in load options,
will give warning and inform user that he will use the sort scope which specified in create
table.

4	Feature restrictions
NA
5	Dependencies
NA
6	Technical risk
NA


  was:

1	Requirement
Currently, Users can specify sort column in table properties when create table. And when load
data, users can also specify sort scope in load options.
In order to improve the ease of use for users, it will be better to specify the sort related
parameters all in create table command.
Once sort scope is specified in create table command, it will be used in load data even users
have specified in load options.

2	Detailed design
2.1	Task-01
Requirement: Create table can support specify sort scope
Implement: Take use of table properties (Map<String, String>), will specify sort scope
in table properties by key/value pair, then existing interface will be called to write this
key/value pair into metastore.
Will support Global Sort,Local Sort and No Sort,it can be specified in sql command:

CREATE TABLE tableWithGlobalSort (
shortField SHORT,
intField INT,
bigintField LONG,
doubleField DOUBLE,
stringField STRING,
timestampField TIMESTAMP,
decimalField DECIMAL(18,2),
dateField DATE,
charField CHAR(5)
)
STORED BY 'carbondata'
TBLPROPERTIES('SORT_COLUMNS'='stringField', 'SORT_SCOPE'='GLOBAL_SORT')
 
Tips:If the sort scope is global Sort, users should specify GLOBAL_SORT_PARTITIONS. If users
do not specify it, it will use the number of map task. GLOBAL_SORT_PARTITIONS should be Integer
type, the range is [1,Integer.MaxValue],it is only used when the sort scope is global sort.


Global Sort	Use orderby operator in spark, data is ordered in segment level.
Local Sort	Node ordered, carbondata file is ordered if it is written by one task. 
No Sort	No sort

Tips:key and value is case-insensitive.
2.2	Task-02
Requirement:
Load data in will support local sort, no sort, global sort 
Ignore the sort scope specified in load data and use the parameter which specified in create
table.

Currently, user can specify the sort scope and global sort partitions in load options, After
modification, it will ignore the sort scope which specified in load options and will get sort
scope from table properties.

Current logic: sort scope is from load options
Number		Prerequisite	Sort scope
1	isSortTable is true && Sort Scope is Global Sort	Global Sort(first check)
2	isSortTable is false	No Sort
3	isSortTable is true	Local Sort
Tips: isSortTable is true means this table contains sort column or it contains dimensions
(except complex type), like string type.

For example:
Create table xxx1 (col1 string col2 int) stored by ‘carbondata’ --- sort table
Create table xx1 (col1 int, col2 int) stored by ‘carbondata’ --- not sort table
Create table xx (col1 int, col2 string) stored by ‘carbondata’ tblproperties (‘sort_column’=’col1’)
 –- sort table

New logic:sort scope is from create table
Number	Prerequisite	Code branch
1	isSortTable = true && Sort Scope is Global Sort	Global Sort(first check)
2	isSortTable= false || Sort Scope is No Sort	No Sort
3	isSortTable is true && Sort Scope is Local Sort	Local Sort
4	isSortTable is true,without specify Sort Scope	Local Sort, (Keep current logic) 

3	Acceptance standard
Number	Acceptance standard
1	Use can specify sort scope(global, local, no sort) when create carbon table in sql type
2	Load data will ignore the sort scope specified in load options and will use the parameter
which specify in create table command. If user still specify the sort scope in load options,
will give warning and inform user that he will use the sort scope which specified in create
table.

4	Feature restrictions
NA
5	Dependencies
NA
6	Technical risk
NA


     Issue Type: Improvement  (was: Bug)

> Unify the sort column and sort scope in create table command
> ------------------------------------------------------------
>
>                 Key: CARBONDATA-1438
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-1438
>             Project: CarbonData
>          Issue Type: Improvement
>            Reporter: chenerlu
>
> 1	Requirement
> Currently, Users can specify sort column in table properties when create table. And when
load data, users can also specify sort scope in load options.
> In order to improve the ease of use for users, it will be better to specify the sort
related parameters all in create table command.
> Once sort scope is specified in create table command, it will be used in load data even
users have specified in load options.
> 2	Detailed design
> 2.1	Task-01
> Requirement: Create table can support specify sort scope
> Implement: Take use of table properties (Map<String, String>), will specify sort
scope in table properties by key/value pair, then existing interface will be called to write
this key/value pair into metastore.
> Will support Global Sort,Local Sort and No Sort,it can be specified in sql command:
> CREATE TABLE tableWithGlobalSort (
> shortField SHORT,
> intField INT,
> bigintField LONG,
> doubleField DOUBLE,
> stringField STRING,
> timestampField TIMESTAMP,
> decimalField DECIMAL(18,2),
> dateField DATE,
> charField CHAR(5)
> )
> STORED BY 'carbondata'
> TBLPROPERTIES('SORT_COLUMNS'='stringField', 'SORT_SCOPE'='GLOBAL_SORT')
>  
> Tips:If the sort scope is global Sort, users should specify GLOBAL_SORT_PARTITIONS.
If users do not specify it, it will use the number of map task. GLOBAL_SORT_PARTITIONS should
be Integer type, the range is [1,Integer.MaxValue],it is only used when the sort scope is
global sort. 
> Global Sort	Use orderby operator in spark, data is ordered in segment level.
> Local Sort	Node ordered, carbondata file is ordered if it is written by one task. 
> No Sort	No sort
> Tips:key and value is case-insensitive.
> 2.2	Task-02
> Requirement:
> Load data in will support local sort, no sort, global sort 
> Ignore the sort scope specified in load data and use the parameter which specified in
create table.
> Currently, user can specify the sort scope and global sort partitions in load options,
After modification, it will ignore the sort scope which specified in load options and will
get sort scope from table properties.
> Current logic: sort scope is from load options
> Number		Prerequisite	Sort scope
> 1	isSortTable is true && Sort Scope is Global Sort	Global Sort(first check)
> 2	isSortTable is false	No Sort
> 3	isSortTable is true	Local Sort
> Tips: isSortTable is true means this table contains sort column or it contains dimensions
(except complex type), like string type.
> For example:
> Create table xxx1 (col1 string col2 int) stored by ‘carbondata’ --- sort table
> Create table xx1 (col1 int, col2 int) stored by ‘carbondata’ --- not sort table
> Create table xx (col1 int, col2 string) stored by ‘carbondata’ tblproperties (‘sort_column’=’col1’)
 –- sort table
> New logic:sort scope is from create table
> Number	Prerequisite	Code branch
> 1	isSortTable = true && Sort Scope is Global Sort	Global Sort(first check)
> 2	isSortTable= false || Sort Scope is No Sort	No Sort
> 3	isSortTable is true && Sort Scope is Local Sort	Local Sort
> 4	isSortTable is true,without specify Sort Scope	Local Sort, (Keep current logic) 
> 3	Acceptance standard
> Number	Acceptance standard
> 1	Use can specify sort scope(global, local, no sort) when create carbon table in sql
type
> 2	Load data will ignore the sort scope specified in load options and will use the parameter
which specify in create table command. If user still specify the sort scope in load options,
will give warning and inform user that he will use the sort scope which specified in create
table.
> 4	Feature restrictions
> NA
> 5	Dependencies
> NA
> 6	Technical risk
> NA



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message