Mailing-List: contact issues-help@carbondata.incubator.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@carbondata.incubator.apache.org
Date: Sat, 8 Apr 2017 21:28:41 +0000 (UTC)
From: "Sanoj MG (JIRA)" <jira@apache.org>
To: issues@carbondata.incubator.apache.org
Message-ID: <JIRA.13062695.1491686797000.242104.1491686921527@Atlassian.JIRA>
In-Reply-To: <JIRA.13062695.1491686797000@Atlassian.JIRA>
References: <JIRA.13062695.1491686797000@Atlassian.JIRA> <JIRA.13062695.1491686797069@jira-lw-us.apache.org>
Subject: [jira] [Commented] (CARBONDATA-888) Dictionary include / exclude
 option in dataframe writer
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Sat, 08 Apr 2017 21:28:46 -0000


    [ https://issues.apache.org/jira/browse/CARBONDATA-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961952#comment-15961952 ] 

Sanoj MG commented on CARBONDATA-888:
-------------------------------------

Can this be assigned to me, I have already made the code changes and would like to create a pr.

> Dictionary include / exclude option in dataframe writer
> -------------------------------------------------------
>
>                 Key: CARBONDATA-888
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-888
>             Project: CarbonData
>          Issue Type: Improvement
>          Components: spark-integration
>    Affects Versions: 1.2.0-incubating
>         Environment: HDP 2.5, Spark 1.6
>            Reporter: Sanoj MG
>            Priority: Minor
>             Fix For: 1.2.0-incubating
>
>
> While creating a Carbondata table from dataframe, currently it is not possible to specify columns that needs to be included in or excluded from the dictionary. An option is required to specify it as below : 
> df.write.format("carbondata")
>   .option("tableName", "test")
>   .option("compress","true")
>   .option("dictionary_include","incol1,intcol2")
>   .option("dictionary_exclude","stringcol1,stringcol2")
>   .mode(SaveMode.Overwrite)
> .save()
> We have lot of integer columns that are dimensions, dataframe.save is used to quickly create tables instead of writing ddls, and it would be nice to have this feature to execute POCs.  
>  
>  


--
This message was sent by Atlassian JIRA
(v6.3.15#6346)