carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacky Li (JIRA)" <>
Subject [jira] [Resolved] (CARBONDATA-470) Add unsafe offheap and on-heap sort in carbodata loading
Date Tue, 13 Dec 2016 11:17:58 GMT


Jacky Li resolved CARBONDATA-470.
       Resolution: Fixed
         Assignee: Ravindra Pesala
    Fix Version/s: 1.0.0-incubating

> Add unsafe offheap and on-heap sort in carbodata loading
> --------------------------------------------------------
>                 Key: CARBONDATA-470
>                 URL:
>             Project: CarbonData
>          Issue Type: Improvement
>            Reporter: Ravindra Pesala
>            Assignee: Ravindra Pesala
>             Fix For: 1.0.0-incubating
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
> In the current carbondata system loading performance is not so encouraging since we need
to sort the data at executor level for data loading. Carbondata collects batch of data and
sorts before dumping to the temporary files and finally it does merge sort from those temporary
files to finish sorting. Here we face two major issues , one is disk IO and second is GC issue.
Even though we dump to the file still carbondata face lot of GC issue since we sort batch
data in-memory before dumping to the temporary files.
> To solve the above problems we can introduce Unsafe Storage and Unsafe sort.
> Unsafe Storage : User can configure the memory limit to keep the amount of data to in-memory.
Here we can keep all the data in continuous memory location either on off-heap or on-heap
using Unsafe. Once configure limit exceeds remaining data will be spilled to disk.
> Unsafe Sort : The data which is store in-memory using Unsafe can be sorted using Unsafe

This message was sent by Atlassian JIRA

View raw message