tajo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hyunsik Choi (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TAJO-385) Refactoring TaskScheduler to assign multiple fragments
Date Tue, 17 Dec 2013 01:53:07 GMT

    [ https://issues.apache.org/jira/browse/TAJO-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13849990#comment-13849990

Hyunsik Choi commented on TAJO-385:

This idea is great and will overcome many limits the query whose an input data is a number
of HDFS blocks. Actually, I love this idea. 

In addition, there are some challenges and issues which are somewhat trivial but must be resolved.
 * As I mentioned, this idea requires an alternative to exactly indicate the execution block's
 * The task volume size should be adjustable during one execution block.
  ** Otherwise, the final task execution wave may be longer due to fewer assigned workers
than available worker.
 * It would be great to enable a query master to choose a task scheduler before executing
a certain execution block.
  ** For table writing, some task scheduler can be chosen, and the previous task scheduler
can be useful for compressed, non-splitable, and larger-than-blocks files.

> Refactoring TaskScheduler to assign multiple fragments
> ------------------------------------------------------
>                 Key: TAJO-385
>                 URL: https://issues.apache.org/jira/browse/TAJO-385
>             Project: Tajo
>          Issue Type: Improvement
>          Components: query master
>    Affects Versions: 0.8-incubating
>            Reporter: Jihoon Son
>            Assignee: Jihoon Son
>         Attachments: TAJO-385.patch, TAJO-385_2.patch, TAJO-385_3.patch
> In the current implementation, each task processes only one fragment.
> However, processing multiple fragments in a task will increase the query processing performance
according to the storage layout and the user queries.
> In this issue, TaskScheduler is refactored to enable assigning multiple fragments to
each task.
> Followings should be contained.
> * Schedule Fragments instead of QueryUnits in TaskScheduler
> ** The QueryUnit creation is postponed until TaskScheduler receives task requests from
> ** When TaskScheduler receives task requests from workers, it dynamically creates an
QueryUnit and assigns one or more fragments.
> ** The fragment scheduling should take into account the disk load balancing.

This message was sent by Atlassian JIRA

View raw message