tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyunsik Choi (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (TAJO-9) Change the default intermediate data file format for hash repartitioning
Date Thu, 19 Dec 2013 15:32:18 GMT

     [ https://issues.apache.org/jira/browse/TAJO-9?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hyunsik Choi resolved TAJO-9.
-----------------------------

       Resolution: Duplicate
    Fix Version/s: 0.8-incubating

> Change the default intermediate data file format for hash repartitioning
> ------------------------------------------------------------------------
>
>                 Key: TAJO-9
>                 URL: https://issues.apache.org/jira/browse/TAJO-9
>             Project: Tajo
>          Issue Type: Improvement
>          Components: data shuffle
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.8-incubating
>
>
> For easy debugging, the hash repartitioning have used CSV as the default intermediate
data format. CSV file format may cause parsing overhead, and it may cause relatively large
intermediate data to be transmitted via networks. We need to change it to RawFile or another
efficient file format.
> Digging PartitionedStoredExec class is a good starting point for this issue.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)

Mime
View raw message