Mailing-List: contact issues-help@carbondata.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@carbondata.apache.org
Date: Wed, 20 Sep 2017 08:48:00 +0000 (UTC)
From: "Venkata Ramana G (JIRA)" <jira@apache.org>
To: issues@carbondata.apache.org
Message-ID: <JIRA.12987678.1467948853000.154653.1505897280162@Atlassian.JIRA>
In-Reply-To: <JIRA.12987678.1467948853000@Atlassian.JIRA>
References: <JIRA.12987678.1467948853000@Atlassian.JIRA> <JIRA.12987678.1467948853326@jira-lw-us.apache.org>
Subject: [jira] [Updated] (CARBONDATA-45) Support MAP type
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Wed, 20 Sep 2017 08:48:06 -0000


     [ https://issues.apache.org/jira/browse/CARBONDATA-45?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Venkata Ramana G updated CARBONDATA-45:
---------------------------------------
    Description: 
{code:sql}
>>CREATE TABLE table1 (
                 deviceInformationId int,
                 channelsId string,
                 props map<key:int,value:string>)
              STORED BY 'org.apache.carbondata.format'

>>insert into table1 select 10,'channel1', map(1,'user1',101, 'root')
{code}

format of data to be read from csv, with '$' as level 1 delimiter and map keys terminated by '#'

{code:sql}
>>load data local inpath '/tmp/data.csv' into table1 options ('COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', 'COMPLEX_DELIMITER_FOR_KEY'='#')

20,channel2,2#user2$100#usercommon
30,channel3,3#user3$100#usercommon
40,channel4,4#user3$100#usercommon

>>select channelId, props[100] from table1 where deviceInformationId > 10;

20, usercommon
30, usercommon
40, usercommon

>>select channelId, props from table1 where props[2] == 'user2';

20, {2,'user2', 100, 'usercommon'}
{code}
Following cases needs to  be handled:

||Sub feature||Pending activity||Remarks||
|Basic Maptype support|Develop| Create table DDL, Load map data from CSV, select * from maptable|
|Maptype lookup in projection and filter|Develop|Projection and filters needs execution at spark|
|NULL values, UDFs, Describe support|Develop||
|Compaction support | Test + fix | As compaction works at byte level, no changes required. Needs to add test-cases|
|Insert into table| Develop | Source table data containing Map data needs to convert from spark datatype to string , as carbon takes string as input row |
|Support DDL for Map fields Dictionary include and Dictionary Exclude | Develop | Also needs to handle CarbonDictionaryDecoder  to handle the same. |
|Support multilevel Map | Develop | currently DDL is validated to allow only 2 levels, remove this restriction|
|Support Map value to be a measure | Develop | Currently array and struct supports only dimensions which needs change|
|Support Alter table to add and remove Map column | Develop | implement DDL and requires default value handling |
|Projections of Map loopup push down to carbon | Develop | this is an optimization, when more number of values are present in Map |
|Filter map loolup push down to carbon | Develop | this is an optimization, when more number of values are present in Map |
|Update Map values | Develop | update map value|

h4. Design suggestion:

Map can be represented internally stored as Array<Struct<key,Value>>, So that conversion of data is required to Map data type while giving to spark. Schema will have new column of map type similar to Array.


  was:
{code:sql}
>>CREATE TABLE table1 (
                 deviceInformationId int,
                 channelsId string,
                 props map<key:int,value:string>)
              STORED BY 'org.apache.carbondata.format'

>>insert into table1 select 10,'channel1', map(1,'user1',101, 'root')
{code}

format of data to be read from csv, with '$' as level 1 delimiter and map keys terminated by '#'

{code:sql}
>>load data local inpath '/tmp/data.csv' into table1 options ('COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', 'COMPLEX_DELIMITER_FOR_KEY'='#')

20,channel2,2#user2$100#usercommon
30,channel3,3#user3$100#usercommon
40,channel4,4#user3$100#usercommon

>>select channelId, props[100] from table1 where deviceInformationId > 10;

20, usercommon
30, usercommon
40, usercommon

>>select channelId, props from table1 where props[2] == 'user2';

20, {2,'user2', 100, 'usercommon'}
{code}
Following cases needs to  be handled:

||Sub feature||Pending activity||Remarks||
|Basic Maptype support|Develop| Create table DDL, Load map data from CSV, select * from maptable|
|Maptype lookup in projection and filter|Develop|Projection and filters needs execution at spark|
|NULL values, UDFs, Describe support|Develop||
|Compaction support | Test + fix | As compaction works at byte level, no changes required. Needs to add test-cases|
|Insert into table| Develop | Source table data containing Map data needs to convert from spark datatype to string , as carbon takes string as input row |
|Support DDL for Map fields Dictionary include and Dictionary Exclude | Develop | Also needs to handle CarbonDictionaryDecoder  to handle the same. |
|Support multilevel Map | Develop | currently DDL is validated to allow only 2 levels, remove this restriction|
|Support Map value to be a measure | Develop | Currently supports only dimensions |
|Support Alter table to add and remove Map column | Develop | implement DDL and requires default value handling |
|Projections of Map loopup push down to carbon | Develop | this is an optimization, when more number of values are present in Map |
|Filter map loolup push down to carbon | Develop | this is an optimization, when more number of values are present in Map |
|Update Map values | Develop | update map value|

h4. Design suggestion:

Map can be represented internally stored as Array<Struct<key,Value>>, So that conversion of data is required to Map data type while giving to spark. Schema will have new column of map type similar to Array.


> Support MAP type
> ----------------
>
>                 Key: CARBONDATA-45
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-45
>             Project: CarbonData
>          Issue Type: New Feature
>          Components: core, sql
>            Reporter: cen yuhai
>            Assignee: Venkata Ramana G
>             Fix For: 1.3.0
>
>
> {code:sql}
> >>CREATE TABLE table1 (
>                  deviceInformationId int,
>                  channelsId string,
>                  props map<key:int,value:string>)
>               STORED BY 'org.apache.carbondata.format'
> >>insert into table1 select 10,'channel1', map(1,'user1',101, 'root')
> {code}
> format of data to be read from csv, with '$' as level 1 delimiter and map keys terminated by '#'
> {code:sql}
> >>load data local inpath '/tmp/data.csv' into table1 options ('COMPLEX_DELIMITER_LEVEL_1'='$', 'COMPLEX_DELIMITER_LEVEL_2'=':', 'COMPLEX_DELIMITER_FOR_KEY'='#')
> 20,channel2,2#user2$100#usercommon
> 30,channel3,3#user3$100#usercommon
> 40,channel4,4#user3$100#usercommon
> >>select channelId, props[100] from table1 where deviceInformationId > 10;
> 20, usercommon
> 30, usercommon
> 40, usercommon
> >>select channelId, props from table1 where props[2] == 'user2';
> 20, {2,'user2', 100, 'usercommon'}
> {code}
> Following cases needs to  be handled:
> ||Sub feature||Pending activity||Remarks||
> |Basic Maptype support|Develop| Create table DDL, Load map data from CSV, select * from maptable|
> |Maptype lookup in projection and filter|Develop|Projection and filters needs execution at spark|
> |NULL values, UDFs, Describe support|Develop||
> |Compaction support | Test + fix | As compaction works at byte level, no changes required. Needs to add test-cases|
> |Insert into table| Develop | Source table data containing Map data needs to convert from spark datatype to string , as carbon takes string as input row |
> |Support DDL for Map fields Dictionary include and Dictionary Exclude | Develop | Also needs to handle CarbonDictionaryDecoder  to handle the same. |
> |Support multilevel Map | Develop | currently DDL is validated to allow only 2 levels, remove this restriction|
> |Support Map value to be a measure | Develop | Currently array and struct supports only dimensions which needs change|
> |Support Alter table to add and remove Map column | Develop | implement DDL and requires default value handling |
> |Projections of Map loopup push down to carbon | Develop | this is an optimization, when more number of values are present in Map |
> |Filter map loolup push down to carbon | Develop | this is an optimization, when more number of values are present in Map |
> |Update Map values | Develop | update map value|
> h4. Design suggestion:
> Map can be represented internally stored as Array<Struct<key,Value>>, So that conversion of data is required to Map data type while giving to spark. Schema will have new column of map type similar to Array.


--
This message was sent by Atlassian JIRA
(v6.4.14#64029)