Return-Path: X-Original-To: apmail-tajo-dev-archive@minotaur.apache.org Delivered-To: apmail-tajo-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 853821093F for ; Mon, 9 Dec 2013 12:54:44 +0000 (UTC) Received: (qmail 73220 invoked by uid 500); 9 Dec 2013 12:54:43 -0000 Delivered-To: apmail-tajo-dev-archive@tajo.apache.org Received: (qmail 73094 invoked by uid 500); 9 Dec 2013 12:54:43 -0000 Mailing-List: contact dev-help@tajo.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tajo.incubator.apache.org Delivered-To: mailing list dev@tajo.incubator.apache.org Received: (qmail 73027 invoked by uid 99); 9 Dec 2013 12:54:38 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Dec 2013 12:54:38 +0000 X-ASF-Spam-Status: No, hits=-2000.1 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Mon, 09 Dec 2013 12:54:36 +0000 Received: (qmail 72609 invoked by uid 99); 9 Dec 2013 12:54:07 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 09 Dec 2013 12:54:07 +0000 Date: Mon, 9 Dec 2013 12:54:06 +0000 (UTC) From: "Jihoon Son (JIRA)" To: dev@tajo.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (TAJO-385) Refactoring TaskScheduler to assign multiple fragments MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/TAJO-385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jihoon Son updated TAJO-385: ---------------------------- Attachment: TAJO-385.patch Thanks guys for comments. I attached a patch for this issue. As described above, Fragments are scheduled to TaskScheduler and QueryUnits are dynamically created when TaskScheduler receives a TaskRequest from a worker. When an QueryUnit is created, multiple fragments can be assigned to it. Here, as Keuntae said, the number of fragments assigned to an QueryUnit is an important factor for the performance, but it is hard to get the optimal value because the optimal value can be different according to the data and user queries. So, I added a configuration for users to specify the task size as follows. {noformat} tajo.task.size.default task_size_in_byte {noformat} Also, I refactored TaskScheduler to make the scheduling algorithm pluggable. I tested the patch by running TPC-H queries on my in-house cluster. After applying this patch, the number of remote tasks is a little bit reduced. > Refactoring TaskScheduler to assign multiple fragments > ------------------------------------------------------ > > Key: TAJO-385 > URL: https://issues.apache.org/jira/browse/TAJO-385 > Project: Tajo > Issue Type: Improvement > Components: master > Affects Versions: 0.8-incubating > Reporter: Jihoon Son > Assignee: Jihoon Son > Attachments: TAJO-385.patch > > > In the current implementation, each task processes only one fragment. > However, processing multiple fragments in a task will increase the query processing performance according to the storage layout and the user queries. > In this issue, TaskScheduler is refactored to enable assigning multiple fragments to each task. > Followings should be contained. > * Schedule Fragments instead of QueryUnits in TaskScheduler > ** The QueryUnit creation is postponed until TaskScheduler receives task requests from workers. > ** When TaskScheduler receives task requests from workers, it dynamically creates an QueryUnit and assigns one or more fragments. > ** The fragment scheduling should take into account the disk load balancing. -- This message was sent by Atlassian JIRA (v6.1.4#6159)