carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sounakr <>
Subject [GitHub] incubator-carbondata pull request #604: [CARBONDATA-691] After Compaction re...
Date Fri, 17 Feb 2017 15:12:49 GMT
GitHub user sounakr opened a pull request:

    [CARBONDATA-691] After Compaction records count are mismatched.

    **Problem** : After Compaction record count mismatches with actual count. 
    **Analysis** :The Partitioning method of compaction was wrong. In getPartition method
of CarbonScanRDD.scala supposed to make a list all the blocks of all the segments that needs
to be merged and then make the partition based on taskNo.  Then each partitioned list is given
to each executor. But currently after partitioning the complete list of blocks are being send
to each executor for merging. As each executors merging all the blocks of all the segments,
multiple executors doubles the data. 
    **Fix** : Fix the getPartition method logic to process proper list of blocks to executors.
                 Fix Horizontal Partitioning which merged with IUD.  

You can merge this pull request into a Git repository by running:

    $ git pull master

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #604
commit 4e5ea804a5ab36d79efdb4df425e729245e990ee
Author: sounakr <>
Date:   2017-02-17T14:42:39Z

    Compaction Partitioning changes


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at or file a JIRA ticket
with INFRA.

View raw message