Return-Path: X-Original-To: apmail-tajo-commits-archive@minotaur.apache.org Delivered-To: apmail-tajo-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A1404FF08 for ; Tue, 26 Mar 2013 08:38:48 +0000 (UTC) Received: (qmail 25681 invoked by uid 500); 26 Mar 2013 08:38:48 -0000 Delivered-To: apmail-tajo-commits-archive@tajo.apache.org Received: (qmail 25638 invoked by uid 500); 26 Mar 2013 08:38:48 -0000 Mailing-List: contact commits-help@tajo.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tajo.incubator.apache.org Delivered-To: mailing list commits@tajo.incubator.apache.org Received: (qmail 25624 invoked by uid 99); 26 Mar 2013 08:38:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Mar 2013 08:38:47 +0000 X-ASF-Spam-Status: No, hits=-1999.3 required=5.0 tests=ALL_TRUSTED,FRT_ROLEX,FUZZY_ROLEX,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.3] (HELO mail.apache.org) (140.211.11.3) by apache.org (qpsmtpd/0.29) with SMTP; Tue, 26 Mar 2013 08:38:42 +0000 Received: (qmail 25025 invoked by uid 99); 26 Mar 2013 08:38:19 -0000 Received: from tyr.zones.apache.org (HELO tyr.zones.apache.org) (140.211.11.114) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Mar 2013 08:38:19 +0000 Received: by tyr.zones.apache.org (Postfix, from userid 65534) id 7BE798209F6; Tue, 26 Mar 2013 08:38:19 +0000 (UTC) Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit From: hyunsik@apache.org To: commits@tajo.incubator.apache.org Message-Id: <2b0d9a84c17944ed99a6bbce269cd004@git.apache.org> X-Mailer: ASF-Git Admin Mailer Subject: git commit: TAJO-4: Update the project site (hyunsik) Date: Tue, 26 Mar 2013 08:38:19 +0000 (UTC) X-Virus-Checked: Checked by ClamAV on apache.org Updated Branches: refs/heads/master 457fea185 -> 40138ccfc TAJO-4: Update the project site (hyunsik) Project: http://git-wip-us.apache.org/repos/asf/incubator-tajo/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-tajo/commit/40138ccf Tree: http://git-wip-us.apache.org/repos/asf/incubator-tajo/tree/40138ccf Diff: http://git-wip-us.apache.org/repos/asf/incubator-tajo/diff/40138ccf Branch: refs/heads/master Commit: 40138ccfc9b74e6a60b81b500c8e56afa450d41d Parents: 457fea1 Author: Hyunsik Choi Authored: Tue Mar 26 16:32:18 2013 +0900 Committer: Hyunsik Choi Committed: Tue Mar 26 16:32:18 2013 +0900 ---------------------------------------------------------------------- CHANGES.txt | 4 + tajo-project/pom.xml | 232 ++++++++++++-------- tajo-project/src/site/apt/build.apt | 14 +- tajo-project/src/site/apt/configuration.apt | 8 +- tajo-project/src/site/apt/getting_started.apt | 108 ++++++++-- tajo-project/src/site/apt/index.apt | 76 ++----- tajo-project/src/site/apt/query_language.apt | 30 ++- tajo-project/src/site/site.xml | 71 ++++--- 8 files changed, 330 insertions(+), 213 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/incubator-tajo/blob/40138ccf/CHANGES.txt ---------------------------------------------------------------------- diff --git a/CHANGES.txt b/CHANGES.txt index b366684..65ba489 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -5,10 +5,14 @@ Release 0.2.0 - unreleased NEW FEATURES IMPROVEMENTS + + TAJO-4: Update the project site (hyunsik) + TAJO-2: remove all @author tags and update license header (hyunsik) BUG FIXES TAJO-1: RCFileWrapper always reads whole columns regardless of the target schema. (jihoonson via hyunsik) + TAJO-6: Rename tajo.engine.function.builtin.NewSumInt to SumInt. (rsumbaly) http://git-wip-us.apache.org/repos/asf/incubator-tajo/blob/40138ccf/tajo-project/pom.xml ---------------------------------------------------------------------- diff --git a/tajo-project/pom.xml b/tajo-project/pom.xml index bf1ef47..fff6a97 100644 --- a/tajo-project/pom.xml +++ b/tajo-project/pom.xml @@ -37,7 +37,6 @@ UTF-8 0.2.0-SNAPSHOT 2.0.3-alpha - github @@ -48,86 +47,175 @@ - Database Laboratory, Korea University - http://dbserver.korea.ac.kr + Apache Software Foundation + http://www.apache.org + akarasulu + Alex Karasulu + akarasulu@apache.org + + + + + + + + mattmann + Chris Mattmann + chris.a.mattmann@jpl.nasa.gov + NASA JPL + + + + -8 + + + ereisman + Eli Reisman + ereisman@apache.org + Hortonworks + + + + -8 + + + hsaputra + Henry Saputra + hsaputra@apache.org + Platfora + + + + -8 + + hyunsik Hyunsik Choi - hyunsik.choi@gmail.com - http://diveintodata.org - Database Lab., Korea University - http://dbserver.korea.ac.kr + hyunsik@apache.org + Korea University - project lead - architect - developer + + +9 + + + blrunner + JaeHwa Jung + blrunner@apache.org + Gruter + + + + +9 + + + jghoman + Jakob Homan + jghoman@apache.org + LinkedIn + + + + -8 + + + jhkim + Jinho Kim + jhkim@apache.org + Gruter + + + + +9 jihoonson Jihoon Son - ghoonson@gmail.com - - Database Lab., Korea University - http://dbserver.korea.ac.kr + jihoonson@apache.org + Korea University + + + + +9 + + + omalley + Owen O'Malley + owen@hortonworks.com + Hortonworks - architect - developer + + -8 - ryuhyoseok - Hyoseok Ryu - hyoseok@korea.ac.kr - - Database Lab., Korea University - http://dbserver.korea.ac.kr + rsumbaly + Roshan Sumbaly + rsumbaly@apache.org + LinkedIn - developer + + -8 + + + swkim + Sangwook Kim + swkim@apache.org + Inervit + + + + +9 + + + yliu + Yi Liu + yliu@apache.org + Intel + + + + +8 - - - Byungnam Lim - byungnam@korea.ac.kr - Database Lab., Korea University - http://dbserver.korea.ac.kr - - - Haemi Yang - haemiyang@korea.ac.kr - Database Lab., Korea University - http://dbserver.korea.ac.kr - - - Soohyung Kim - firek@korea.ac.kr - Database Lab., Korea University - http://dbserver.korea.ac.kr - - - Jira - https://dbserver.korea.ac.kr/jira/browse/TAJO + https://issues.apache.org/jira/browse/TAJO - - Jenkins - https://dbserver.korea.ac.kr/jenkins - - - https://github.com/tajo-project/tajo - scm:git:git://github.com/tajo-project/tajo.git - scm:git:git@github.com:tajo-project/tajo.git + https://git-wip-us.apache.org/repos/asf/incubator-tajo.git + scm:git:http://git-wip-us.apache.org/repos/asf/incubator-tajo.git + scm:git:https://git-wip-us.apache.org/repos/asf/incubator-tajo.git + + + Development list + mailto:dev-subscribe@tajo.incubator.apache.org + + mailto:dev-unsubscribe@tajo.incubator.apache.org + + mailto:dev@tajo.incubator.apache.org + http://mail-archives.apache.org/mod_mbox/tajo-dev/ + + + Commit list + mailto:commits-subscribe@tajo.incubator.apache.org + + mailto:commits-unsubscribe@tajo.incubator.apache.org + + mailto:commits@tajo.incubator.apache.org + http://mail-archives.apache.org/mod_mbox/tajo-commits/ + + + apache.snapshots @@ -389,44 +477,6 @@ - com.github.github - downloads-maven-plugin - 0.6 - - Official ${project.name} build of the - ${project.version} release - true - true - github - - - - - upload - - deploy - - - - - com.github.github - site-maven-plugin - 0.7 - - Creating site for ${project.artifactId}, ${project.version} - github - - - - - - site - - site-deploy - - - - org.apache.maven.plugins maven-site-plugin 3.0 http://git-wip-us.apache.org/repos/asf/incubator-tajo/blob/40138ccf/tajo-project/src/site/apt/build.apt ---------------------------------------------------------------------- diff --git a/tajo-project/src/site/apt/build.apt b/tajo-project/src/site/apt/build.apt index 664fb47..b6be014 100644 --- a/tajo-project/src/site/apt/build.apt +++ b/tajo-project/src/site/apt/build.apt @@ -14,18 +14,14 @@ ~~ See the License for the specific language governing permissions and ~~ limitations under the License. - ------ - Tajo - Build Instruction - ------ - Hyunsik Choi - ------ - 2013-02-24 + ----------------- + Build Instruction Build Requirements * Unix System - * Java 1.6 or higher + * Java 1.6 * Protocol Buffers 2.4.1 @@ -51,10 +47,10 @@ Maven main modules Building Tajo from Source - Download the source code from the git repository (https://github.com/tajo-project/tajo) as follows: + Download the source code from the git repository ({{http://git-wip-us.apache.org/repos/asf/incubator-tajo.git}}) as follows: ------------------------------------------------ -$ git clone https://github.com/tajo-project/tajo +$ git clone http://git-wip-us.apache.org/repos/asf/incubator-tajo.git ------------------------------------------------ Then, you can execute maven with the following goals: http://git-wip-us.apache.org/repos/asf/incubator-tajo/blob/40138ccf/tajo-project/src/site/apt/configuration.apt ---------------------------------------------------------------------- diff --git a/tajo-project/src/site/apt/configuration.apt b/tajo-project/src/site/apt/configuration.apt index 26e518b..925a4af 100644 --- a/tajo-project/src/site/apt/configuration.apt +++ b/tajo-project/src/site/apt/configuration.apt @@ -14,12 +14,8 @@ ~~ See the License for the specific language governing permissions and ~~ limitations under the License. - ------ - Tajo - Configuration Guide - ------ - Hyunsik Choi - ------ - 2013-02-24 + --------------- + Configuration Preliminary http://git-wip-us.apache.org/repos/asf/incubator-tajo/blob/40138ccf/tajo-project/src/site/apt/getting_started.apt ---------------------------------------------------------------------- diff --git a/tajo-project/src/site/apt/getting_started.apt b/tajo-project/src/site/apt/getting_started.apt index f3c68a9..9295501 100644 --- a/tajo-project/src/site/apt/getting_started.apt +++ b/tajo-project/src/site/apt/getting_started.apt @@ -14,31 +14,27 @@ ~~ See the License for the specific language governing permissions and ~~ limitations under the License. - ------ - Tajo - Getting Started - ------ - Hyunsik Choi - ------ - 2013-02-01 + --------------- + Getting Started Prerequisites - * Hadoop 2.0.2-alpha or higher + * Hadoop 2.0.3-alpha - * Java 1.6 or higher + * Java 1.6 Build Tajo from Source Code Download the source code and build Tajo as follows: --------------------------------------------- -$ git clone git://github.com/tajo-project/tajo.git +$ git clone http://git-wip-us.apache.org/repos/asf/incubator-tajo.git $ cd tajo $ mvn package -DskipTests -Ddisk -Ptar $ ls tajo-dist/target/tajo-x.y.z.tar.gz --------------------------------------------- - If you want to know the build instruction in more detail, refer to + If you want to know the build instruction in more detail, please refer to {{{./build.html}Build Instruction}}. @@ -88,10 +84,10 @@ export TAJO_HOME= Likewise, you should copy some jar files to the hadoop library dir. ------------------------------------------------------------------------ -cp $TAJO_HOME/tajo-common-x.y.z.jar $HADOOP_HOME/share/yarn/lib -cp $TAJO_HOME/tajo-catalog-common-x.y.z.jar $HADOOP_HOME/share/yarn/lib -cp $TAJO_HOME/tajo-core-pullserver-x.y.z.jar $HADOOP_HOME/share/yarn/lib -cp $TAJO_HOME/tajo-core-storage-x.y.z.jar $HADOOP_HOME/share/yarn/lib +$ cp $TAJO_HOME/tajo-common-x.y.z.jar $HADOOP_HOME/share/yarn/lib +$ cp $TAJO_HOME/tajo-catalog-common-x.y.z.jar $HADOOP_HOME/share/yarn/lib +$ cp $TAJO_HOME/tajo-core-pullserver-x.y.z.jar $HADOOP_HOME/share/yarn/lib +$ cp $TAJO_HOME/tajo-core-storage-x.y.z.jar $HADOOP_HOME/share/yarn/lib ------------------------------------------------------------------------ Copy ${TAJO_HOME}/conf/tajo-site.xml.templete to tajo-site.xml. @@ -118,22 +114,94 @@ Running Tajo Before launching the tajo, you should create the tajo root dir and set the permission as follows: ------------------------------------------------------------------------ -$HADOOP_HOME/bin/hadoop fs -mkdir /tajo -$HADOOP_HOME/bin/hadoop fs -chmod g+w /tajo +$ $HADOOP_HOME/bin/hadoop fs -mkdir /tajo +$ $HADOOP_HOME/bin/hadoop fs -chmod g+w /tajo ------------------------------------------------------------------------ To launch the tajo master, execute start-tajo.sh. ----------------------------- -$TAJO_HOME/bin/start-tajo.sh +$ $TAJO_HOME/bin/start-tajo.sh ----------------------------- After then, you can use tajo-cli to access the command line interface of Tajo. ----------------------------- -$TAJO_HOME/bin/tajo cli +$ $TAJO_HOME/bin/tajo cli ----------------------------- -[] +Query Execution - (still working) \ No newline at end of file + First of all, we need to prepare some data for query execution. + +----------------------------- +$ mkdir /home/x/table1 +$ cd /home/x/table1 +$ cat >> table1 +1|abc|1.1|a +2|def|2.3|b +3|ghi|3.4|c +4|jkl|4.5|d +5|mno|5.6|e + +----------------------------- + + This schema of this table is (int, string, float, string). + +----------------------------- +$ $TAJO_HOME/bin/tajo cli + +tajo> create external table table1 (id int, name string, score float, type string) using csv with ('csvfile.delimiter'='|') location 'file:/home/x/table1' +----------------------------- + + In order to load an external table, we need to use 'create external table' statement. + In the location clause, you should use the absolute path with an appropriate scheme. + If the table resides in HDFS, we should use 'hdfs' instead of 'file'. + + If you want to know DDL statements in more detail, please see + {{{./query_language.html}Query Language}}. + +----------------------------- +tajo> /t +table1 +----------------------------- + + '/t' command shows the list of tables. + +----------------------------- +tajo> /d table1 + +table name: table1 +table path: file:/home/x/table1 +store type: CSV +number of rows: 0 +volume (bytes): 78 B +schema: +id INT +name STRING +score FLOAT +type STRING + +----------------------------- + + '/d [table name]' command shows the description of a given table. + + Now, you can execute SQL queries as follows: + +----------------------------- +tajo> select * from table1 where id > 2 +final state: QUERY_SUCCEEDED, init time: 4.118 sec, execution time: 4.334 sec, total response time: 8.452 sec +result: hdfs://x.x.x.x:8020/user/x/tajo/q_1363768615503_0001_000001 + +id, name, score, type +- - - - - - - - - - - - - +3, ghi, 3.4, c +4, jkl, 4.5, d +5, mno, 5.6, e +tajo> +------------------------------- + + (In the current implementation, for each query, Tajo has some initial overhead to launch containers + on node managers. However, we will reduce this overhead soon.) + + Enjoy Apache Tajo! http://git-wip-us.apache.org/repos/asf/incubator-tajo/blob/40138ccf/tajo-project/src/site/apt/index.apt ---------------------------------------------------------------------- diff --git a/tajo-project/src/site/apt/index.apt b/tajo-project/src/site/apt/index.apt index ed8cf98..9f5e0cb 100644 --- a/tajo-project/src/site/apt/index.apt +++ b/tajo-project/src/site/apt/index.apt @@ -14,64 +14,34 @@ ~~ See the License for the specific language governing permissions and ~~ limitations under the License. - ------ - Tajo - Introduction - ------ - Hyunsik Choi - ------ - 2013-02-25 - -What is Tajo? +Introduction Tajo is a relational and distributed data warehouse system for Hadoop. Tajo is designed for low-latency and scalable ad-hoc queries, online aggregation - and ETL on large-data sets by leveraging advanced database techniques. - It supports SQL standards. Tajo uses HDFS as a primary storage layer and - has its own query engine which allows direct control of distributed execution and data flow. - As a result, Tajo has a variety of query evaluation strategies and more optimization - opportunities. In addition, Tajo will have a native columnar execution and and its optimizer. - Tajo will be an alternative choice to Hive/Pig on the top of MapReduce. - -Current Status + and ETL on large-data sets by leveraging advanced database techniques. It supports SQL standards. + Tajo uses HDFS as a primary storage layer and has its own query engine which allows direct + control of distributed execution and data flow. As a result, Tajo has a variety of query + evaluation strategies and more optimization opportunities. In addition, Tajo will have a native + columnar execution and and its optimizer. - Tajo is in the alpha stage. Users can execute usual SQL queries (e.g., selection, projection, - group-by, join, union and sort) except for nested queries. Tajo provides - various storage formats, such as CSV, RCFile, RowFile (a row-store file we have implemented), - and Trevni, and it also has a rudimentary ETL feature to transform one data format to another data - format. In addition, Tajo provides hash and range repartitions. By using both repartition - methods, Tajo processes aggregation, join, and sort queries over a number of cluster nodes. - If you want to know the current status in more detail, checkout this - {{{http://www.slideshare.net/hyunsikchoi/tajo-intro}Slide}}. +Features -Why Tajo? + * Fast and low-latency query processing on SQL queries including projection, filter, group-by, + sort, and join. - * <> + * Rudiment ETL that transforms one data format to another data format. - Tajo uses Hadoop Distributed File System (HDFS) as a primary storage layer. - Tajo incorporates the advantages of MapReduce and shared-nothing parallel databases - to yield the scalability. + * Support various file formats, such as CSV, RCFile, RowFile (a row store file), and Trevni. - * <> + * Command line interface to allow users to submit SQL queries - We have two goals for low-latency queries. The first goal is to allow users to get estimates - of an aggregate query in an online fashion as soon as the query is submitted. - This is feasible if a user wants a quick picture - rather than exact results. The second goal is efficient query processing. We achieve it with - various query evaluation strategies, query optimization, high throughput engine, and and - efficient I/O. + * Java API to enable clients to submit SQL queries to Tajo - * <> - - Hadoop Distributed File System (HDFS) has played a role of the centralized data storage for - data intensive computing. Collected log data and data streams are - usually stored into HDFS. Tajo provides a scalable and low-latency means to processes - them on location without ETL and additional data loading. +News - * <> + * <<[2013-03-07]>> Tajo Project enters incubation. - Long-running queries are also required to process big data. - Tajo supports the fault tolerance to avoid a complete query restart - in the case that the query fails. + * <<[2012-10-15]>> A demonstration paper of Tajo was accepted to IEEE ICDE 2013. Documentation @@ -87,11 +57,11 @@ Presentations * <<[2013-02-24]>> {{{http://www.slideshare.net/hyunsikchoi/tajo-intro}Introduction to Tajo}} -News - - * <<[2012-10-15]>> A demonstration paper of Tajo was accepted to IEEE ICDE 2013. - -Contact +Disclaimer - If you have any question or suggestion for the project, please send an email to - hyunsik.choi at gmail dot com. \ No newline at end of file + Apache Tajo is an effort undergoing incubation at The Apache Software Foundation (ASF) + sponsored by the Apache Incubator PMC. Incubation is required of all newly accepted projects + until a further review indicates that the infrastructure, communications, and decision making + process have stabilized in a manner consistent with other successful ASF projects. + While incubation status is not necessarily a reflection of the completeness or stability of the + code, it does indicate that the project has yet to be fully endorsed by the ASF. http://git-wip-us.apache.org/repos/asf/incubator-tajo/blob/40138ccf/tajo-project/src/site/apt/query_language.apt ---------------------------------------------------------------------- diff --git a/tajo-project/src/site/apt/query_language.apt b/tajo-project/src/site/apt/query_language.apt index 66a6b0a..e0d284a 100644 --- a/tajo-project/src/site/apt/query_language.apt +++ b/tajo-project/src/site/apt/query_language.apt @@ -14,14 +14,28 @@ ~~ See the License for the specific language governing permissions and ~~ limitations under the License. - ------------------------ - Tajo - Query Language - ------------------------ - Hyunsik Choi - ------------------------ - 2013-02-01 - - Tajo supports SQL2003 and some extensions. + --------------- + Query Language + +Primitive types + + * byte - 1 byte value + + * bool - boolean value (1 byte) + + * short - 2 byte integer + + * int - 4 byte integer + + * long - 8 byte integer + + * float - single precision (4 byte) + + * double - double precision (8 byte) + + * bytes + + * string - sequence of characters in UTF-8 DDL http://git-wip-us.apache.org/repos/asf/incubator-tajo/blob/40138ccf/tajo-project/src/site/site.xml ---------------------------------------------------------------------- diff --git a/tajo-project/src/site/site.xml b/tajo-project/src/site/site.xml index b6b7f9d..cb2b689 100644 --- a/tajo-project/src/site/site.xml +++ b/tajo-project/src/site/site.xml @@ -29,10 +29,19 @@ - Tajo: A Distributed Data Warehouse System for Hadoop - http://tajo-project.github.com/tajo + Apache Tajo + http://incubator.apache.org/tajo/ + + Apache Incubator + http://incubator.apache.org/images/egg-logo.png + http://incubator.apache.org/ + + + + + + + + + + + + + + - - + + + + + + + + - + - - - - - - - +
+
+ Apache Tajo, Apache Hadoop, Apache, the Apache feather logo, and the Apache incubator logo are + trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks + or registered trademarks of their respective owners. +
+
+ true true - true + false production - - - hyunsik_choi - true - true -