Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 58441200CD1 for ; Wed, 26 Jul 2017 16:05:58 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 56EFD168E10; Wed, 26 Jul 2017 14:05:58 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9C460168E0D for ; Wed, 26 Jul 2017 16:05:57 +0200 (CEST) Received: (qmail 42720 invoked by uid 500); 26 Jul 2017 14:05:56 -0000 Mailing-List: contact issues-help@carbondata.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@carbondata.apache.org Delivered-To: mailing list issues@carbondata.apache.org Received: (qmail 42711 invoked by uid 99); 26 Jul 2017 14:05:56 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 26 Jul 2017 14:05:56 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id 64B32E00C5; Wed, 26 Jul 2017 14:05:54 +0000 (UTC) From: xuchuanyin To: issues@carbondata.apache.org Reply-To: issues@carbondata.apache.org References: In-Reply-To: Subject: [GitHub] carbondata pull request #1198: [CARBONDATA-1281] Support multiple temp dirs ... Content-Type: text/plain Message-Id: <20170726140555.64B32E00C5@git1-us-west.apache.org> Date: Wed, 26 Jul 2017 14:05:54 +0000 (UTC) archived-at: Wed, 26 Jul 2017 14:05:58 -0000 GitHub user xuchuanyin reopened a pull request: https://github.com/apache/carbondata/pull/1198 [CARBONDATA-1281] Support multiple temp dirs for writing files while loading data # Modifications This feature mainly focus on avoiding disk hot-spot in single massive data loading, changes are made in two parts: 1. randomly choose a yarn local folder while writing sort temp file each time in sort-process; 2.randomly choose a yarn local folder while writing carbondata file each time in write-process. # Usage To enable this feature, user should enable `carbon.use.multi.temp.dir=true` and `carbon.use.local.dir=true`. # Performance In my case, this feature improves the loading performance from 35M/s/node to 70+M/s/node You can merge this pull request into a Git repository by running: $ git pull https://github.com/xuchuanyin/carbondata new_feature_mtd4l Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1198.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1198 ---- commit 46da65a1a0579c62a7f4196ae622f83dd5197e3a Author: xuchuanyin Date: 2017-07-25T11:17:53Z Support multiple temp dirs for writing files while loading data randomly choose a dir to write sort temp files randomly choose a dir to write carbondata files Fix errors in spelling optimize default value for using multiple temp dir update document for multiple temp dirs feature update property name (cherry picked from commit 71ab293ef8d2ff24a122bb074b7b95bca8c1b77e) commit 6e35dec70196a12aaac24a69c795d3597f946386 Author: xuchuanyin Date: 2017-07-25T11:20:32Z Add tests for multiple temp dirs during data loading Fix bugs in tests remove header in test data remove useless comment remove added useless testdata update data source for tests (cherry picked from commit ee355b78c0d703d5bc2d2767837c32b6cc422361) commit 3e633070c3f793867c03ba350048994ced0e5527 Author: xuchuanyin Date: 2017-07-25T12:28:17Z resolve review comments + update documents + update parameter name + optimize code to avoid duplicate lines commit 9f746178600d7c16267bd0276b8a492f69871802 Author: xuchuanyin Date: 2017-07-25T12:42:35Z fix checkstyle error ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---