tajo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Tajo Wiki] Update of "Roadmap" by HyunsikChoi
Date Fri, 25 Oct 2013 04:08:01 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tajo Wiki" for change notification.

The "Roadmap" page has been changed by HyunsikChoi:
https://wiki.apache.org/tajo/Roadmap?action=diff&rev1=3&rev2=4

  
  == Milestone ==
   * 0.2 - first release as an incubating project focused on ASF compliance
+  * 0.8 - More features and more SQL compatibility
+  * 1.0-alpha - More fault tolerance plus the experimental JIT and vectorized engine
-  * 0.3 - more stable API and robust features and a rudimentary cost-based optimizer
-  * 0.4 - more SQL supports and more improved cost-based optimizer
-  * 0.5 - a native columnar execution engine
  
+ == 0.2 ==
+  * https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12314424&version=12324252
- == Long Term Plan ==
-  * Integration with Hadoop ecosystem
-   * Tajo catalog needs to support HCatalog or needs to be compatible to Hive meta.
-  * The native columnar execution engine
-  * Cost-based optimization which also includes a rewrite rule engine and various rewrite
rules   
-  * User-defined type (UDT)
-   * With UDT, in addition to primitive types (e.g., int, float..), Tajo will use a structured
data type as a column data type.
-   * It would be a powerful complementary means of the relational data model.
  
+ == 0.8 ==
+  * Add JDBC driver ([[https://issues.apache.org/jira/browse/TAJO-176|TAJO-176]])
+  * Add EXPLAIN clause  ([[https://issues.apache.org/jira/browse/TAJO-122|TAJO-122]])
+  * Add Table Partitioning
+  * Allow Catalog to access HCatalog ([[https://issues.apache.org/jira/browse/TAJO-16|TAJO-16]])
+  * Upgrade Hadoop to 2.2-GA
+  * Support Outer joins
+  * Support Binary/Text (de)serializer of RCFile, allowing Hive compatibility.
+  * Improve Tajo's Resource Manager ability, enabling disk-based and memory-based resource
scheduling.
- == Short/Mid Term Plan ==
-  * Improvement of the DAG framework
-   * Query is both FSM and a DAG representation.
-     * It would be good to separate Query to a FSM part and a DAG part.
-   * We need easier interface to edit and build DAGs.
-  * RCFile
-   * In the current implementation, RCFile is not compatible to Hive's one because Tajo's
RCFile uses Datum to (de)serialize data. So, we will have additional RCFile wrapper class
compatible to Hive's files.
-  * ORCFile
-   * It looks promising. We need to port ORCFile.
-  * Trevni
-   * TrevniScanner works well in most cases. However, it doesn't support null value. We need
to handle it.
-  *  hadoop security in tajo-rpc
-   * tajo-rpc does not support hadoop security. Since Tajo will be a part of Hadoop ecosystem,
we need to apply hadoop security to tajo-rpc.
-  * Intermediate Data Format
-   * As I mentioned above,  Tajo uses CSV as the intermediatee data format. It may cause
CPU overhead and is relatively large to be transmitted via networks. We need to change it.
-  * JDBC/ODBC drivers
-   * Tajo is a relational DW system. If we have such connectors, it can be easily integrated
with existing BI and OLAP tools.
-  * RESTful APIs
-   * It's very useful for web-based applications.
-  * Proper resource allocation for SubQuery (i.e., Execution Block in PPT)
-     * SubQuery is one step of multiple query steps. For each subquery, QueryMaster launches
TaskRunners via Yarn, and the launched TaskRunners are reused within a subquery.
-     * Now, QueryMaster assigns the fixed-sized resource (2G memory) to subqueries regardless
of necessary resource. We need to improve it to allocate proper resources to subqueries. For
example, QueryMaster assigns 1G to one subquery for only scan or assigns 2G to another subquery
including joins. 
-  * Error handling of TajoCli
-    * TajoCli is a command line interface that uses Jline2. However, its error handling is
awful. It frequently halts when trivial exceptions onccur.
-  * SQL data types
-    * Currently, Tajo provides data types (i.e., byte, bool, int, long, float, double, bytes,
and string) based on Java primitive types. Tajo should support SQL standard data types.
-  * Local mode
-    *  Queries are always executed in a distributed mode. In other words, it always uses
Yarn. However, it is inconvenience for debugging and is inefficient in single machine. We
need to implement something for local mode.
-  * Parallel launch of containers 
-    * Currently, node containers are executed sequentially (see TaskRunnerLauncherImpl.java).
It looks very inefficient. We can improve it by using ExecutorService.
-  * Output commit
-    * In some cases, Tajo is fault tolerance. It requires output commit mechanism. However,
Tajo does not support it, and we need this feature.
-  * Broadcast join and Limit operator
-    * As I mentioned before, they are disabled after Yarn port. We should enable them.
-  * HbaseScanner/Appender
-    * Hbase will be a great storage for Tajo.
  
+ == 1.0-alpha (expecting, not determined) ==
+  * Add ODBC Driver
+  * New experimental JIT and vectorized query engine
+  * Add More UDFs
+ 

Mime
View raw message