carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From xuchuanyin <...@git.apache.org>
Subject [GitHub] carbondata pull request #1195: [CARBONDATA-1281] Support multiple temp dirs ...
Date Tue, 25 Jul 2017 12:37:23 GMT
GitHub user xuchuanyin opened a pull request:

    https://github.com/apache/carbondata/pull/1195

    [CARBONDATA-1281] Support multiple temp dirs for writing files while loading data

    # Modifications
    This feature mainly focus on avoiding disk hot-spot in single massive data loading, changes
are made in two parts: 
    
    1. randomly choose a yarn local folder while writing sort temp file each time in sort-process;
    
    2.randomly choose a yarn local folder while writing carbondata file each time in write-process.
    
    # Usage
    
    To enable this feature, user should enable `carbon.use.multi.temp.dir=true` and `carbon.use.local.dir=true`.
    
    # Performance
    In my case, this feature improves the loading performance from 35M/s/node to 70+M/s/node


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/xuchuanyin/carbondata feature_mtd4l

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/carbondata/pull/1195.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1195
    
----
commit 0d9910896a6c6a696a53f2c905ef23d1870c9b90
Author: xuchuanyin <xuchuanyin@hust.edu.cn>
Date:   2017-07-25T11:17:53Z

    Support multiple temp dirs for writing files while loading data
    
    randomly choose a dir to write sort temp files
    
    randomly choose a dir to write carbondata files
    
    Fix errors in spelling
    
    optimize default value for using multiple temp dir
    
    update document for multiple temp dirs feature
    
    update property name
    
    (cherry picked from commit 71ab293ef8d2ff24a122bb074b7b95bca8c1b77e)

commit 8000041266cb188e8876ae07d61f271993d33459
Author: xuchuanyin <xuchuanyin@hust.edu.cn>
Date:   2017-07-25T11:20:32Z

    Add tests for multiple temp dirs during data loading
    
    Fix bugs in tests
    
    remove header in test data
    
    remove useless comment
    
    remove added useless testdata
    
    update data source for tests
    
    (cherry picked from commit ee355b78c0d703d5bc2d2767837c32b6cc422361)

commit 92637c6035358b3cc354966d2dc29e1003f387dd
Author: xuchuanyin <xuchuanyin@hust.edu.cn>
Date:   2017-07-25T12:28:17Z

    resolve review comments
    
    + update documents
    + update parameter name
    + optimize code to avoid duplicate lines

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message