Return-Path: X-Original-To: apmail-tajo-commits-archive@minotaur.apache.org Delivered-To: apmail-tajo-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 67856FF29 for ; Tue, 26 Mar 2013 09:29:01 +0000 (UTC) Received: (qmail 71311 invoked by uid 500); 26 Mar 2013 09:29:00 -0000 Delivered-To: apmail-tajo-commits-archive@tajo.apache.org Received: (qmail 71249 invoked by uid 500); 26 Mar 2013 09:28:59 -0000 Mailing-List: contact commits-help@tajo.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tajo.incubator.apache.org Delivered-To: mailing list commits@tajo.incubator.apache.org Received: (qmail 71221 invoked by uid 99); 26 Mar 2013 09:28:58 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Mar 2013 09:28:58 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.131] (HELO eos.apache.org) (140.211.11.131) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Mar 2013 09:28:56 +0000 Received: from eos.apache.org (localhost [127.0.0.1]) by eos.apache.org (Postfix) with ESMTP id D58A423F for ; Tue, 26 Mar 2013 09:28:36 +0000 (UTC) MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable From: Apache Wiki To: Apache Wiki Date: Tue, 26 Mar 2013 09:28:36 -0000 Message-ID: <20130326092836.64458.49959@eos.apache.org> Subject: =?utf-8?q?=5BTajo_Wiki=5D_Update_of_=22Roadmap=22_by_HyunsikChoi?= Auto-Submitted: auto-generated X-Virus-Checked: Checked by ClamAV on apache.org Dear Wiki user, You have subscribed to a wiki page or wiki category on "Tajo Wiki" for chan= ge notification. The "Roadmap" page has been changed by HyunsikChoi: http://wiki.apache.org/tajo/Roadmap Comment: Moved the roadmap from github wiki. New page: =3D Roadmap =3D =3D=3D Milestone =3D=3D * 0.2 - first release as an incubating project focused on ASF compliance * 0.3 - more stable API and robust features and a rudimentary cost-based o= ptimizer * 0.4 - more SQL supports and more improved cost-based optimizer * 0.5 - a native columnar execution engine =3D=3D Long Term Plan =3D=3D * Integration with Hadoop ecosystem * Tajo catalog needs to support HCatalog or needs to be compatible to Hiv= e meta. * The native columnar execution engine * Cost-based optimization which also includes a rewrite rule engine and va= rious rewrite rules = =3D=3D Short/Mid Term Plan =3D=3D * Improvement of the DAG framework * Query is both FSM and a DAG representation. * It would be good to separate Query to a FSM part and a DAG part. * We need easier interface to edit and build DAGs. * RCFile * In the current implementation, RCFile is not compatible to Hive's one b= ecause Tajo's RCFile uses Datum to (de)serialize data. So, we will have add= itional RCFile wrapper class compatible to Hive's files. * ORCFile * It looks promising. We need to port ORCFile. * Trevni * TrevniScanner works well in most cases. However, it doesn't support nul= l value. We need to handle it. * hadoop security in tajo-rpc * tajo-rpc does not support hadoop security. Since Tajo will be a part of= Hadoop ecosystem, we need to apply hadoop security to tajo-rpc. * Intermediate Data Format * As I mentioned above, Tajo uses CSV as the intermediatee data format. = It may cause CPU overhead and is relatively large to be transmitted via net= works. We need to change it. * JDBC/ODBC drivers * Tajo is a relational DW system. If we have such connectors, it can be e= asily integrated with existing BI and OLAP tools. * Restful API * It's very useful for web-based applications. * Proper resource allocation for SubQuery (i.e., Execution Block in PPT) * SubQuery is one step of multiple query steps. For each subquery, Quer= yMaster launches TaskRunners via Yarn, and the launched TaskRunners are reu= sed within a subquery. * Now, QueryMaster assigns the fixed-sized resource (2G memory) to subq= ueries regardless of necessary resource. We need to improve it to allocate = proper resources to subqueries. For example, QueryMaster assigns 1G to one = subquery for only scan or assigns 2G to another subquery including joins. = * Error handling of TajoCli * TajoCli is a command line interface that uses Jline2. However, its err= or handling is awful. It frequently halts when trivial exceptions onccur. * SQL data types * Currently, Tajo provides data types (i.e., byte, bool, int, long, floa= t, double, bytes, and string) based on Java primitive types. Tajo should su= pport SQL standard data types. * Local mode * Queries are always executed in a distributed mode. In other words, it= always uses Yarn. However, it is inconvenience for debugging and is ineffi= cient in single machine. We need to implement something for local mode. * Parallel launch of containers = * Currently, node containers are executed sequentially (see TaskRunnerLa= uncherImpl.java). It looks very inefficient. We can improve it by using Exe= cutorService. * Output commit * In some cases, Tajo is fault tolerance. It requires output commit mech= anism. However, Tajo does not support it, and we need this feature. * Broadcast join and Limit operator * As I mentioned before, they are disabled after Yarn port. We should en= able them. * HbaseScanner/Appender * Hbase will be a great storage for Tajo.