tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyunsik Choi (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (TAJO-36) Improve ExternalSortExec with N-merge sort and final pass omission
Date Wed, 04 Sep 2013 07:17:51 GMT

     [ https://issues.apache.org/jira/browse/TAJO-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Hyunsik Choi updated TAJO-36:

    Priority: Critical  (was: Major)
> Improve ExternalSortExec with N-merge sort and final pass omission
> ------------------------------------------------------------------
>                 Key: TAJO-36
>                 URL: https://issues.apache.org/jira/browse/TAJO-36
>             Project: Tajo
>          Issue Type: Improvement
>          Components: physical operator
>            Reporter: Hyunsik Choi
>            Priority: Critical
>              Labels: gsoc, gsoc2013, mentor
> Background:
> The current ExternalSortExec just uses the binary external merge sort algorithm (http://en.wikipedia.org/wiki/External_sorting#External_merge_sort).
In other words, for each pass, ExternalSortExec just merges two files into one sorted file.
> Proposal:
> The goal of this proposal is to improve ExternalSortExec with the following improvements:
> * N-merge sort - we can merge N files though more memory at each pass. It will reduce
the number of passes. Consequently, it will reduces considerable I/O overheads.
> * the final pass omission - a physical operator is pipelined by the parent operator.
The final pass of the merge sort must also be invoked by the parent physical operator. So,
we can omit the final pass of the merge sort.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message