carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ravipesala <...@git.apache.org>
Subject [GitHub] incubator-carbondata pull request #369: [CARBONDATA-470][WIP]Add unsafe offh...
Date Tue, 29 Nov 2016 19:34:14 GMT
GitHub user ravipesala opened a pull request:

    https://github.com/apache/incubator-carbondata/pull/369

    [CARBONDATA-470][WIP]Add unsafe offheap and on-heap sort in carbodata loading

    In the current carbondata system loading performance is not so encouraging since we need
to sort the data at executor level for data loading. Carbondata collects batch of data and
sorts before dumping to the temporary files and finally it does merge sort from those temporary
files to finish sorting. Here we face two major issues , one is disk IO and second is GC issue.
Even though we dump to the file still carbondata face lot of GC issue since we sort batch
data in-memory before dumping to the temporary files.
    To solve the above problems we can introduce Unsafe Storage and Unsafe sort.
    Unsafe Storage : User can configure the memory limit to keep the amount of data to in-memory.
Here we can keep all the data in continuous memory location either on off-heap or on-heap
using Unsafe. Once configure limit exceeds remaining data will be spilled to disk.
    Unsafe Sort : The data which is store in-memory using Unsafe can be sorted using Unsafe
sort.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ravipesala/incubator-carbondata unsafesortnew

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-carbondata/pull/369.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #369
    
----
commit d223681c799d373beb30748166d3f181ed86981a
Author: ravipesala <ravi.pesala@gmail.com>
Date:   2016-11-27T12:09:36Z

    Optimize data loading

commit f21dc18a304efe171ebb64a2e7135534b4dd09fd
Author: ravipesala <ravi.pesala@gmail.com>
Date:   2016-11-28T11:42:01Z

    Unsafe Sort

commit b0b93560776944f51e6aa6fe5d4a0ed326f21834
Author: ravipesala <ravi.pesala@gmail.com>
Date:   2016-11-28T11:58:05Z

    disabled memory merge

commit a4ab3abc07396a42526bcfe8cd5e9b1714df56c9
Author: ravipesala <ravi.pesala@gmail.com>
Date:   2016-11-28T12:00:58Z

    disabled memory merge

commit 95eee6288d938efdf60923e277dbae13b2645021
Author: ravipesala <ravi.pesala@gmail.com>
Date:   2016-11-29T03:45:08Z

    refactored

commit 771d00d18fadf421f8dec9ad266185cae06af402
Author: ravipesala <ravi.pesala@gmail.com>
Date:   2016-11-29T19:31:06Z

    Fixed merging issues.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message