carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jackylk <...@git.apache.org>
Subject [GitHub] carbondata pull request #1808: [CARBONDATA-2023][DataLoad] Add size base blo...
Date Fri, 09 Feb 2018 01:06:59 GMT
Github user jackylk commented on a diff in the pull request:

    https://github.com/apache/carbondata/pull/1808#discussion_r167115897
  
    --- Diff: docs/useful-tips-on-carbondata.md ---
    @@ -169,5 +169,6 @@
       | carbon.use.local.dir | spark/carbonlib/carbon.properties | Data loading | Whether
use YARN local directories for multi-table load disk load balance | If this is set it to true
CarbonData will use YARN local directories for multi-table load disk load balance, that will
improve the data load performance. |
       | carbon.use.multiple.temp.dir | spark/carbonlib/carbon.properties | Data loading |
Whether to use multiple YARN local directories during table data loading for disk load balance
| After enabling 'carbon.use.local.dir', if this is set to true, CarbonData will use all YARN
local directories during data load for disk load balance, that will improve the data load
performance. Please enable this property when you encounter disk hotspot problem during data
loading. |
       | carbon.sort.temp.compressor | spark/carbonlib/carbon.properties | Data loading |
Specify the name of compressor to compress the intermediate sort temporary files during sort
procedure in data loading. | The optional values are 'SNAPPY','GZIP','BZIP2','LZ4' and empty.
By default, empty means that Carbondata will not compress the sort temp files. This parameter
will be useful if you encounter disk bottleneck. |
    +  | carbon.load.skewed.data.optimization | spark/carbonlib/carbon.properties | Data loading
| Whether to enable size based block allocation strategy for data loading. | Carbondata will
use number based block allocation strategy by default and it will make sure that all the executors
process the same number of blocks. If this value is set to true, Carbondata will make sure
that all the executors process the same size of data -- It's useful if the size of your input
data files varies widely, say 1MB~1GB. |
    --- End diff --
    
    `Carbondata will use number based block allocation strategy by default` change to 
    `When loading, carbondata will use file size based block allocation strategy for task
distribution`


---

Mime
View raw message