carbondata-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jacky Li (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CARBONDATA-318) Implement an ExternalSorter that makes maximum usage of memory while sorting
Date Sat, 15 Oct 2016 06:15:20 GMT

     [ https://issues.apache.org/jira/browse/CARBONDATA-318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jacky Li updated CARBONDATA-318:
--------------------------------
    Description: 
External Sorter should sort in memory until it reach configured size, then spill to disk.
It should provide following interface:
1. insertRow/insertRowBatch: will take an Iterator as input and insert rows from the iterator
into the sorter. Some consideration
    1) sorter will decide when to spill to disk based on the total inserted size. (JDK does
not provide API for object size, need another JIRA issue to improve on this)
    2) use TreeMap as sorter's in memory data structure, since it can sort as data inserted
online

2. getIterator: will return an iterator that iterates on sorted rows, the sorted row could
come from memory or files

External Sorter depends on FileWriterFactory to get a FileWriter to spill data into files.
FileWriterFactory should be provided by configuration. Multiple implementations are possible,
like writing into one folder or multiple folders

  was:
External Sorter should sort in memory until it reach configured size, then spill to disk.
It should provide following interface:
1. insertRow/insertRowBatch: will take an Iterator as input and insert rows from the iterator
into the sorter. sorter will decide when to spill to disk based on the total inserted size.
(JDK does not provide API for object size, need another JIRA issue to improve on this)
2. getIterator: will return an iterator that iterates on sorted rows, the sorted row could
come from memory or files

External Sorter depends on FileWriterFactory to get a FileWriter to spill data into files.
FileWriterFactory should be provided by configuration. Multiple implementations are possible,
like writing into one folder or multiple folders


> Implement an ExternalSorter that makes maximum usage of memory while sorting
> ----------------------------------------------------------------------------
>
>                 Key: CARBONDATA-318
>                 URL: https://issues.apache.org/jira/browse/CARBONDATA-318
>             Project: CarbonData
>          Issue Type: Sub-task
>            Reporter: Jacky Li
>             Fix For: 0.2.0-incubating
>
>
> External Sorter should sort in memory until it reach configured size, then spill to disk.
It should provide following interface:
> 1. insertRow/insertRowBatch: will take an Iterator as input and insert rows from the
iterator into the sorter. Some consideration
>     1) sorter will decide when to spill to disk based on the total inserted size. (JDK
does not provide API for object size, need another JIRA issue to improve on this)
>     2) use TreeMap as sorter's in memory data structure, since it can sort as data inserted
online
> 2. getIterator: will return an iterator that iterates on sorted rows, the sorted row
could come from memory or files
> External Sorter depends on FileWriterFactory to get a FileWriter to spill data into files.
FileWriterFactory should be provided by configuration. Multiple implementations are possible,
like writing into one folder or multiple folders



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message