tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jihoon Son (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TAJO-292) Too many intermediate partition files
Date Tue, 03 Dec 2013 11:15:35 GMT

    [ https://issues.apache.org/jira/browse/TAJO-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13837586#comment-13837586
] 

Jihoon Son commented on TAJO-292:
---------------------------------

When the size of the intermediate data is sufficiently large, the number of tasks looks to
be the number of worker slots.
In my opinion, since the number of tasks is fixed regardless of the size of the intermediate
data, the task failure overhead will be increased as the size of the intermediate data increases.
How about limit the maximum task size?

> Too many intermediate partition files
> -------------------------------------
>
>                 Key: TAJO-292
>                 URL: https://issues.apache.org/jira/browse/TAJO-292
>             Project: Tajo
>          Issue Type: Bug
>          Components: repartitioning
>    Affects Versions: 0.2-incubating
>            Reporter: Hyunsik Choi
>            Assignee: Jinho Kim
>            Priority: Critical
>             Fix For: 0.8-incubating
>
>         Attachments: TAJO-292.patch
>
>
> Unlike the before, the number of partitions are being currently determined by the volume
size and the number of distinct keys. It can cause unnecessary overheads. We need to improve
the partition number determiner to consider the number of cluster nodes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Mime
View raw message