tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyunsik Choi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TAJO-292) Too many intermediate partition files
Date Wed, 04 Dec 2013 15:19:35 GMT

    [ https://issues.apache.org/jira/browse/TAJO-292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838973#comment-13838973

Hyunsik Choi commented on TAJO-292:

For me, this is a good workaround code for this problem. Here is my comments about your patch.

 * It would be better to rename tajo.worker.start.cleanup to tajo.worker.tmpdir.cleanup-at-startup.
It's because the config is for tajo.worker.tmpdir. It looks more consistent.
* the below code should be inserted into the end of WorkerManagerService::cleanup(). In addition,
cleanup's return type need to be BoolProto.
** Async rpc internally keeps a callback sequence id in the concurrent map until it is returned.
So, done.run must be called once.
 * For the same reason, the line 184 In QueryMaster should be changed to 
tajoWorkerProtocolService.cleanup(null, queryId.getProto(), NullCallback.get());

> Too many intermediate partition files
> -------------------------------------
>                 Key: TAJO-292
>                 URL: https://issues.apache.org/jira/browse/TAJO-292
>             Project: Tajo
>          Issue Type: Bug
>          Components: repartitioning
>    Affects Versions: 0.2-incubating
>            Reporter: Hyunsik Choi
>            Assignee: Jinho Kim
>            Priority: Critical
>             Fix For: 0.8-incubating
>         Attachments: TAJO-292.patch
> Unlike the before, the number of partitions are being currently determined by the volume
size and the number of distinct keys. It can cause unnecessary overheads. We need to improve
the partition number determiner to consider the number of cluster nodes.

This message was sent by Atlassian JIRA

View raw message