Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EB384200C50 for ; Sat, 8 Apr 2017 23:28:45 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E9AFE160B93; Sat, 8 Apr 2017 21:28:45 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3CB90160B83 for ; Sat, 8 Apr 2017 23:28:45 +0200 (CEST) Received: (qmail 69609 invoked by uid 500); 8 Apr 2017 21:28:44 -0000 Mailing-List: contact issues-help@carbondata.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.incubator.apache.org Delivered-To: mailing list issues@carbondata.incubator.apache.org Received: (qmail 69600 invoked by uid 99); 8 Apr 2017 21:28:44 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 08 Apr 2017 21:28:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 1D4321A0526 for ; Sat, 8 Apr 2017 21:28:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id QLT0GxZFZj_X for ; Sat, 8 Apr 2017 21:28:43 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 068BF5FB5C for ; Sat, 8 Apr 2017 21:28:43 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6A73CE0185 for ; Sat, 8 Apr 2017 21:28:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id D10DE24065 for ; Sat, 8 Apr 2017 21:28:41 +0000 (UTC) Date: Sat, 8 Apr 2017 21:28:41 +0000 (UTC) From: "Sanoj MG (JIRA)" To: issues@carbondata.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CARBONDATA-888) Dictionary include / exclude option in dataframe writer MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 08 Apr 2017 21:28:46 -0000 [ https://issues.apache.org/jira/browse/CARBONDATA-888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15961952#comment-15961952 ] Sanoj MG commented on CARBONDATA-888: ------------------------------------- Can this be assigned to me, I have already made the code changes and would like to create a pr. > Dictionary include / exclude option in dataframe writer > ------------------------------------------------------- > > Key: CARBONDATA-888 > URL: https://issues.apache.org/jira/browse/CARBONDATA-888 > Project: CarbonData > Issue Type: Improvement > Components: spark-integration > Affects Versions: 1.2.0-incubating > Environment: HDP 2.5, Spark 1.6 > Reporter: Sanoj MG > Priority: Minor > Fix For: 1.2.0-incubating > > > While creating a Carbondata table from dataframe, currently it is not possible to specify columns that needs to be included in or excluded from the dictionary. An option is required to specify it as below : > df.write.format("carbondata") > .option("tableName", "test") > .option("compress","true") > .option("dictionary_include","incol1,intcol2") > .option("dictionary_exclude","stringcol1,stringcol2") > .mode(SaveMode.Overwrite) > .save() > We have lot of integer columns that are dimensions, dataframe.save is used to quickly create tables instead of writing ddls, and it would be nice to have this feature to execute POCs. > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)