carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacky Li (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CARBONDATA-318) Implement an ExternalSorter that makes maximum usage of memory while sorting
Date Sun, 16 Oct 2016 01:19:20 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jacky Li updated CARBONDATA-318:
--------------------------------
    Description: 
External Sorter should sort in memory until it reach configured size, then spill to disk.
It should provide following interface:
1. insertRow/insertRowBatch: will take an Iterator as input and insert rows from the iterator
into the sorter.

2. getIterator: will return an iterator that iterates on sorted rows, the sorted row could
come from memory or files


  was:
External Sorter should sort in memory until it reach configured size, then spill to disk.
It should provide following interface:
1. insertRow/insertRowBatch: will take an Iterator as input and insert rows from the iterator
into the sorter. Some consideration
    1) sorter will decide when to spill to disk based on the total inserted size. (JDK does
not provide API for object size, need another JIRA issue to improve on this)
    2) use TreeMap as sorter's in memory data structure, since it can sort as data inserted
online

2. getIterator: will return an iterator that iterates on sorted rows, the sorted row could
come from memory or files

External Sorter depends on FileWriterFactory to get a FileWriter to spill data into files.
FileWriterFactory should be provided by configuration. Multiple implementations are possible,
like writing into one folder or multiple folders


> Implement an ExternalSorter that makes maximum usage of memory while sorting
> ----------------------------------------------------------------------------
>
>                 Key: CARBONDATA-318
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-318
>             Project: CarbonData
>          Issue Type: Sub-task
>            Reporter: Jacky Li
>             Fix For: 0.2.0-incubating
>
>
> External Sorter should sort in memory until it reach configured size, then spill to disk.
It should provide following interface:
> 1. insertRow/insertRowBatch: will take an Iterator as input and insert rows from the
iterator into the sorter.
> 2. getIterator: will return an iterator that iterates on sorted rows, the sorted row
could come from memory or files



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message