Return-Path: X-Original-To: apmail-tajo-commits-archive@minotaur.apache.org Delivered-To: apmail-tajo-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EC0B6C67E for ; Thu, 11 Dec 2014 14:41:26 +0000 (UTC) Received: (qmail 26431 invoked by uid 500); 11 Dec 2014 14:41:25 -0000 Delivered-To: apmail-tajo-commits-archive@tajo.apache.org Received: (qmail 26337 invoked by uid 500); 11 Dec 2014 14:41:25 -0000 Mailing-List: contact commits-help@tajo.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@tajo.apache.org Delivered-To: mailing list commits@tajo.apache.org Received: (qmail 26227 invoked by uid 99); 11 Dec 2014 14:41:25 -0000 Received: from eris.apache.org (HELO hades.apache.org) (140.211.11.105) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 11 Dec 2014 14:41:25 +0000 Received: from hades.apache.org (localhost [127.0.0.1]) by hades.apache.org (ASF Mail Server at hades.apache.org) with ESMTP id E46EEAC0EEC; Thu, 11 Dec 2014 14:41:24 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r1644656 [2/20] - in /tajo/site/docs/devel: ./ _sources/ _sources/backup_and_restore/ _sources/configuration/ _sources/functions/ _sources/getting_started/ _sources/partitioning/ _sources/sql_language/ _sources/table_management/ _sources/ts... Date: Thu, 11 Dec 2014 14:41:22 -0000 To: commits@tajo.apache.org From: hyunsik@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20141211144124.E46EEAC0EEC@hades.apache.org> Added: tajo/site/docs/devel/_sources/introduction.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/introduction.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/introduction.txt (added) +++ tajo/site/docs/devel/_sources/introduction.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,13 @@ +*************** +Introduction +*************** + +The main goal of Apache Tajo project is to build an advanced open source +data warehouse system in Hadoop for processing web-scale data sets. +Basically, Tajo provides SQL standard as a query language. +Tajo is designed for both interactive and batch queries on data sets +stored on HDFS and other data sources. Without hurting query response +times, Tajo provides fault-tolerance and dynamic load balancing which +are necessary for long-running queries. Tajo employs a cost-based and +progressive query optimization techniques for reoptimizing running +queries in order to avoid the worst query plans. \ No newline at end of file Added: tajo/site/docs/devel/_sources/jdbc_driver.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/jdbc_driver.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/jdbc_driver.txt (added) +++ tajo/site/docs/devel/_sources/jdbc_driver.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,113 @@ +************************************* +Tajo JDBC Driver +************************************* + +Apache Tajo™ provides JDBC driver +which enables Java applciations to easily access Apache Tajo in a RDBMS-like manner. +In this section, we explain how to get JDBC driver and an example code. + +How to get JDBC driver +======================= + +From Binary Distribution +-------------------------------- + +Tajo binary distribution provides JDBC jar file and its dependent JAR files. +Those files are located in ``${TAJO_HOME}/share/jdbc-dist/``. + + +From Building Source Code +-------------------------------- + +You can build Tajo from the source code and then get JAR files as follows: + +.. code-block:: bash + + $ tar xzvf tajo-x.y.z-src.tar.gz + $ mvn clean package -DskipTests -Pdist -Dtar + $ ls -l tajo-dist/target/tajo-x.y.z/share/jdbc-dist + + +Setting the CLASSPATH +======================= + +In order to use the JDBC driver, you should set the jar files included in +``tajo-dist/target/tajo-x.y.z/share/jdbc-dist`` to your ``CLASSPATH``. +In addition, you should add hadoop clsspath into your ``CLASSPATH``. +So, ``CLASSPATH`` will be set as follows: + +.. code-block:: bash + + CLASSPATH=path/to/tajo-jdbc/*:path/to/tajo-site.xml:path/to/core-site.xml:path/to/hdfs-site.xml + +.. note:: + + You must add the locations which include Tajo config files (i.e., ``tajo-site.xml``) and + Hadoop config files (i.e., ``core-site.xml`` and ``hdfs-site.xml``) to your ``CLASSPATH``. + + +An Example JDBC Client +======================= + +The JDBC driver class name is ``org.apache.tajo.jdbc.TajoDriver``. +You can get the driver ``Class.forName("org.apache.tajo.jdbc.TajoDriver")``. +The connection url should be ``jdbc:tajo://:/``. +The default TajoMaster client rpc port is ``26002``. +If you want to change the listening port, please refer :doc:`/configuration/configuration_defaults`. + +.. note:: + + Currently, Tajo does not support the concept of database and namespace. + All tables are contained in ``default`` database. So, you don't need to specify any database name. + +The following shows an example of JDBC Client. + +.. code-block:: java + + import java.sql.Connection; + import java.sql.ResultSet; + import java.sql.Statement; + import java.sql.DriverManager; + + public class TajoJDBCClient { + + .... + + public static void main(String[] args) throws Exception { + + try { + Class.forName("org.apache.tajo.jdbc.TajoDriver"); + } catch (ClassNotFoundException e) { + // fill your handling code + } + + Connection conn = DriverManager.getConnection("jdbc:tajo://127.0.0.1:26002/default"); + + Statement stmt = null; + ResultSet rs = null; + try { + stmt = conn.createStatement(); + rs = stmt.executeQuery("select * from table1"); + while (rs.next()) { + System.out.println(rs.getString(1) + "," + rs.getString(3)); + } + } finally { + if (rs != null) rs.close(); + if (stmt != null) stmt.close(); + if (conn != null) conn.close(); + } + } + } + + +FAQ +=========================================== + +java.nio.channels.UnresolvedAddressException +-------------------------------------------- + +When retriving the final result, Tajo JDBC Driver tries to access HDFS data nodes. +So, the network access between JDBC client and HDFS data nodes must be available. +In many cases, a HDFS cluster is built in a private network which use private hostnames. +So, the host names must be shared with the JDBC client side. + Added: tajo/site/docs/devel/_sources/partitioning/column_partitioning.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/partitioning/column_partitioning.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/partitioning/column_partitioning.txt (added) +++ tajo/site/docs/devel/_sources/partitioning/column_partitioning.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,52 @@ +********************************* +Column Partitioning +********************************* + +The column table partition is designed to support the partition of Apache Hive™. + +================================================ +How to Create a Column Partitioned Table +================================================ + +You can create a partitioned table by using the ``PARTITION BY`` clause. For a column partitioned table, you should use +the ``PARTITION BY COLUMN`` clause with partition keys. + +For example, assume there is a table ``orders`` composed of the following schema. :: + + id INT, + item_name TEXT, + price FLOAT + +Also, assume that you want to use ``order_date TEXT`` and ``ship_date TEXT`` as the partition keys. +Then, you should create a table as follows: + +.. code-block:: sql + + CREATE TABLE orders ( + id INT, + item_name TEXT, + price + ) PARTITION BY COLUMN (order_date TEXT, ship_date TEXT); + +================================================== +Partition Pruning on Column Partitioned Tables +================================================== + +The following predicates in the ``WHERE`` clause can be used to prune unqualified column partitions without processing +during query planning phase. + +* ``=`` +* ``<>`` +* ``>`` +* ``<`` +* ``>=`` +* ``<=`` +* LIKE predicates with a leading wild-card character +* IN list predicates + +================================================== +Compatibility Issues with Apache Hive™ +================================================== + +If partitioned tables of Hive are created as external tables in Tajo, Tajo can process the Hive partitioned tables directly. +There haven't known compatibility issues yet. \ No newline at end of file Added: tajo/site/docs/devel/_sources/partitioning/hash_partitioning.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/partitioning/hash_partitioning.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/partitioning/hash_partitioning.txt (added) +++ tajo/site/docs/devel/_sources/partitioning/hash_partitioning.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,5 @@ +******************************** +Hash Partitioning +******************************** + +.. todo:: \ No newline at end of file Added: tajo/site/docs/devel/_sources/partitioning/intro_to_partitioning.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/partitioning/intro_to_partitioning.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/partitioning/intro_to_partitioning.txt (added) +++ tajo/site/docs/devel/_sources/partitioning/intro_to_partitioning.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,15 @@ +************************************** +Introduction to Partitioning +************************************** + +Table partitioning provides two benefits: easy table management and data pruning by partition keys. +Currently, Apache Tajo only provides Apache Hive-compatible column partitioning. + +========================= +Partitioning Methods +========================= + +Tajo provides the following partitioning methods: + * Column Partitioning + * Range Partitioning (TODO) + * Hash Partitioning (TODO) \ No newline at end of file Added: tajo/site/docs/devel/_sources/partitioning/range_partitioning.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/partitioning/range_partitioning.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/partitioning/range_partitioning.txt (added) +++ tajo/site/docs/devel/_sources/partitioning/range_partitioning.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,5 @@ +*************************** +Range Partitioning +*************************** + +.. todo:: \ No newline at end of file Added: tajo/site/docs/devel/_sources/sql_language.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/sql_language.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/sql_language.txt (added) +++ tajo/site/docs/devel/_sources/sql_language.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,13 @@ +************ +SQL Language +************ + +.. toctree:: + :maxdepth: 1 + + sql_language/data_model + sql_language/ddl + sql_language/insert + sql_language/queries + sql_language/sql_expression + sql_language/predicates \ No newline at end of file Added: tajo/site/docs/devel/_sources/sql_language/data_model.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/sql_language/data_model.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/sql_language/data_model.txt (added) +++ tajo/site/docs/devel/_sources/sql_language/data_model.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,66 @@ +********** +Data Model +********** + +=============== +Data Types +=============== + ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| Support | SQL Type Name | Alias | Size (byte) | Description | Range | ++===========+================+============================+=============+===================================================+==========================================================================+ +| O | boolean | bool | 1 | | true/false | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | bit | | 1 | | 1/0 | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | varbit | bit varying | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | smallint | tinyint, int2 | 2 | small-range integer value | -2^15 (-32,768) to 2^15 (32,767) | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | integer | int, int4 | 4 | integer value | -2^31 (-2,147,483,648) to 2^31 - 1 (2,147,483,647) | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | bigint | bit varying | 8 | larger range integer value | -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807) | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | real | int8 | 4 | variable-precision, inexact, real number value | -3.4028235E+38 to 3.4028235E+38 (6 decimal digits precision) | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | float[(n)] | float4 | 4 or 8 | variable-precision, inexact, real number value | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | double | float8, double precision | 8 | variable-precision, inexact, real number value | 1 .7E–308 to 1.7E+308 (15 decimal digits precision) | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | number | decimal | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | char[(n)] | character | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | varchar[(n)] | character varying | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | text | text | | variable-length unicode text | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | binary | binary | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | varbinary[(n)] | binary varying | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | blob | bytea | | variable-length binary string | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | date | | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | time | | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | timetz | time with time zone | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | timestamp | | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| | timestamptz | | | | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ +| O | inet4 | | 4 | IPv4 address | | ++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ + +----------------------------------------- +Using real number value (real and double) +----------------------------------------- + +The real and double data types are mapped to float and double of java primitives respectively. Java primitives float and double follows the IEEE 754 specification. So, these types are correctly matched to SQL standard data types. + ++ float[( n )] is mapped to either float or double according to a given length n. If n is specified, it must be bewtween 1 and 53. The default value of n is 53. ++ If 1 <- n <- 24, a value is mapped to float (6 decimal digits precision). ++ If 25 <- n <- 53, a value is mapped to double (15 decimal digits precision). ++ Do not use approximate real number columns in WHERE clause in order to compare some exact matches, especially the - and <> operators. The > or < comparisons work well. \ No newline at end of file Added: tajo/site/docs/devel/_sources/sql_language/ddl.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/sql_language/ddl.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/sql_language/ddl.txt (added) +++ tajo/site/docs/devel/_sources/sql_language/ddl.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,78 @@ +************************ +Data Definition Language +************************ + +======================== +CREATE DATABASE +======================== + +*Synopsis* + +.. code-block:: sql + + CREATE DATABASE [IF NOT EXISTS] + +``IF NOT EXISTS`` allows ``CREATE DATABASE`` statement to avoid an error which occurs when the database exists. + +======================== +DROP DATABASE +======================== + +*Synopsis* + +.. code-block:: sql + + DROP DATABASE [IF EXISTS] + +``IF EXISTS`` allows ``DROP DATABASE`` statement to avoid an error which occurs when the database does not exist. + +======================== +CREATE TABLE +======================== + +*Synopsis* + +.. code-block:: sql + + CREATE TABLE [IF NOT EXISTS] [( , ... )] + [using [with ( = , ...)]] [AS ] + + CREATE EXTERNAL TABLE [IF NOT EXISTS] ( , ... ) + using [with ( = , ...)] LOCATION '' + +``IF NOT EXISTS`` allows ``CREATE [EXTERNAL] TABLE`` statement to avoid an error which occurs when the table does not exist. + +------------------------ + Compression +------------------------ + +If you want to add an external table that contains compressed data, you should give 'compression.code' parameter to CREATE TABLE statement. + +.. code-block:: sql + + create EXTERNAL table lineitem ( + L_ORDERKEY bigint, + L_PARTKEY bigint, + ... + L_COMMENT text) + + USING csv WITH ('text.delimiter'='|','compression.codec'='org.apache.hadoop.io.compress.DeflateCodec') + LOCATION 'hdfs://localhost:9010/tajo/warehouse/lineitem_100_snappy'; + +`compression.codec` parameter can have one of the following compression codecs: + * org.apache.hadoop.io.compress.BZip2Codec + * org.apache.hadoop.io.compress.DeflateCodec + * org.apache.hadoop.io.compress.GzipCodec + * org.apache.hadoop.io.compress.SnappyCodec + +======================== + DROP TABLE +======================== + +*Synopsis* + +.. code-block:: sql + + DROP TABLE [IF EXISTS] [PURGE] + +``IF EXISTS`` allows ``DROP DATABASE`` statement to avoid an error which occurs when the database does not exist. ``DROP TABLE`` statement removes a table from Tajo catalog, but it does not remove the contents. If ``PURGE`` option is given, ``DROP TABLE`` statement will eliminate the entry in the catalog as well as the contents. \ No newline at end of file Added: tajo/site/docs/devel/_sources/sql_language/insert.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/sql_language/insert.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/sql_language/insert.txt (added) +++ tajo/site/docs/devel/_sources/sql_language/insert.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,26 @@ +************************* +INSERT (OVERWRITE) INTO +************************* + +INSERT OVERWRITE statement overwrites a table data of an existing table or a data in a given directory. Tajo's INSERT OVERWRITE statement follows ``INSERT INTO SELECT`` statement of SQL. The examples are as follows: + +.. code-block:: sql + + create table t1 (col1 int8, col2 int4, col3 float8); + + -- when a target table schema and output schema are equivalent to each other + INSERT OVERWRITE INTO t1 SELECT l_orderkey, l_partkey, l_quantity FROM lineitem; + -- or + INSERT OVERWRITE INTO t1 SELECT * FROM lineitem; + + -- when the output schema are smaller than the target table schema + INSERT OVERWRITE INTO t1 SELECT l_orderkey FROM lineitem; + + -- when you want to specify certain target columns + INSERT OVERWRITE INTO t1 (col1, col3) SELECT l_orderkey, l_quantity FROM lineitem; + +In addition, INSERT OVERWRITE statement overwrites table data as well as a specific directory. + +.. code-block:: sql + + INSERT OVERWRITE INTO LOCATION '/dir/subdir' SELECT l_orderkey, l_quantity FROM lineitem; \ No newline at end of file Added: tajo/site/docs/devel/_sources/sql_language/predicates.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/sql_language/predicates.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/sql_language/predicates.txt (added) +++ tajo/site/docs/devel/_sources/sql_language/predicates.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,159 @@ +***************** + Predicates +***************** + +===================== + IN Predicate +===================== + +IN predicate provides row and array comparison. + +*Synopsis* + +.. code-block:: sql + + column_reference IN (val1, val2, ..., valN) + column_reference NOT IN (val1, val2, ..., valN) + + +Examples are as follows: + +.. code-block:: sql + + -- this statement filters lists down all the records where col1 value is 1, 2 or 3: + SELECT col1, col2 FROM table1 WHERE col1 IN (1, 2, 3); + + -- this statement filters lists down all the records where col1 value is neither 1, 2 nor 3: + SELECT col1, col2 FROM table1 WHERE col1 NOT IN (1, 2, 3); + +You can use 'IN clause' on text data domain as follows: + +.. code-block:: sql + + SELECT col1, col2 FROM table1 WHERE col2 IN ('tajo', 'hadoop'); + + SELECT col1, col2 FROM table1 WHERE col2 NOT IN ('tajo', 'hadoop'); + + +================================== +String Pattern Matching Predicates +================================== + +-------------------- +LIKE +-------------------- + +LIKE operator returns true or false depending on whether its pattern matches the given string. An underscore (_) in pattern matches any single character. A percent sign (%) matches any sequence of zero or more characters. + +*Synopsis* + +.. code-block:: sql + + string LIKE pattern + string NOT LIKE pattern + + +-------------------- +ILIKE +-------------------- + +ILIKE is the same to LIKE, but it is a case insensitive operator. It is not in the SQL standard. We borrow this operator from PostgreSQL. + +*Synopsis* + +.. code-block:: sql + + string ILIKE pattern + string NOT ILIKE pattern + + +-------------------- +SIMILAR TO +-------------------- + +*Synopsis* + +.. code-block:: sql + + string SIMILAR TO pattern + string NOT SIMILAR TO pattern + +It returns true or false depending on whether its pattern matches the given string. Also like LIKE, ``SIMILAR TO`` uses ``_`` and ``%`` as metacharacters denoting any single character and any string, respectively. + +In addition to these metacharacters borrowed from LIKE, 'SIMILAR TO' supports more powerful pattern-matching metacharacters borrowed from regular expressions: + ++------------------------+-------------------------------------------------------------------------------------------+ +| metacharacter | description | ++========================+===========================================================================================+ +| | | denotes alternation (either of two alternatives). | ++------------------------+-------------------------------------------------------------------------------------------+ +| * | denotes repetition of the previous item zero or more times. | ++------------------------+-------------------------------------------------------------------------------------------+ +| + | denotes repetition of the previous item one or more times. | ++------------------------+-------------------------------------------------------------------------------------------+ +| ? | denotes repetition of the previous item zero or one time. | ++------------------------+-------------------------------------------------------------------------------------------+ +| {m} | denotes repetition of the previous item exactly m times. | ++------------------------+-------------------------------------------------------------------------------------------+ +| {m,} | denotes repetition of the previous item m or more times. | ++------------------------+-------------------------------------------------------------------------------------------+ +| {m,n} | denotes repetition of the previous item at least m and not more than n times. | ++------------------------+-------------------------------------------------------------------------------------------+ +| [] | A bracket expression specifies a character class, just as in POSIX regular expressions. | ++------------------------+-------------------------------------------------------------------------------------------+ +| () | Parentheses can be used to group items into a single logical item. | ++------------------------+-------------------------------------------------------------------------------------------+ + +Note that `.`` is not used as a metacharacter in ``SIMILAR TO`` operator. + +--------------------- +Regular expressions +--------------------- + +Regular expressions provide a very powerful means for string pattern matching. In the current Tajo, regular expressions are based on Java-style regular expressions instead of POSIX regular expression. The main difference between java-style one and POSIX's one is character class. + +*Synopsis* + +.. code-block:: sql + + string ~ pattern + string !~ pattern + + string ~* pattern + string !~* pattern + ++----------+---------------------------------------------------------------------------------------------------+ +| operator | Description | ++==========+===================================================================================================+ +| ~ | It returns true if a given regular expression is matched to string. Otherwise, it returns false. | ++----------+---------------------------------------------------------------------------------------------------+ +| !~ | It returns false if a given regular expression is matched to string. Otherwise, it returns true. | ++----------+---------------------------------------------------------------------------------------------------+ +| ~* | It is the same to '~', but it is case insensitive. | ++----------+---------------------------------------------------------------------------------------------------+ +| !~* | It is the same to '!~', but it is case insensitive. | ++----------+---------------------------------------------------------------------------------------------------+ + +Here are examples: + +.. code-block:: sql + + 'abc' ~ '.*c' true + 'abc' ~ 'c' false + 'aaabc' ~ '([a-z]){3}bc true + 'abc' ~* '.*C' true + 'abc' !~* 'B.*' true + +Regular expressions operator is not in the SQL standard. We borrow this operator from PostgreSQL. + +*Synopsis for REGEXP and RLIKE operators* + +.. code-block:: sql + + string REGEXP pattern + string NOT REGEXP pattern + + string RLIKE pattern + string NOT RLIKE pattern + +But, they do not support case-insensitive operators. \ No newline at end of file Added: tajo/site/docs/devel/_sources/sql_language/queries.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/sql_language/queries.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/sql_language/queries.txt (added) +++ tajo/site/docs/devel/_sources/sql_language/queries.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,256 @@ +************************** +Queries +************************** + +===================== +Overview +===================== + +*Synopsis* + +.. code-block:: sql + + SELECT [distinct [all]] * | [[AS] ] [, ...] + [FROM [[AS]
] [, ...]] + [WHERE ] + [GROUP BY [, ...]] + [HAVING ] + [ORDER BY [ASC|DESC] [NULL FIRST|NULL LAST] [, ...]] + + + +===================== +From Clause +===================== + +*Synopsis* + +.. code-block:: sql + + [FROM
[[AS]
] [, ...]] + + +The ``FROM`` clause specifies one or more other tables given in a comma-separated table reference list. +A table reference can be a relation name , or a subquery, a table join, or complex combinations of them. + +----------------------- +Table and Table Aliases +----------------------- + +A temporary name can be given to tables and complex table references to be used +for references to the derived table in the rest of the query. This is called a table alias. + +To create a a table alias, please use ``AS``: + +.. code-block:: sql + + FROM table_reference AS alias + +or + +.. code-block:: sql + + FROM table_reference alias + +The ``AS`` keyword can be omitted, and *Alias* can be any identifier. + +A typical application of table aliases is to give short names to long table references. For example: + +.. code-block:: sql + + SELECT * FROM long_table_name_1234 s JOIN another_long_table_name_5678 a ON s.id = a.num; + +------------- +Joined Tables +------------- + +Tajo supports all kinds of join types. + +Join Types +~~~~~~~~~~ + +Cross Join +^^^^^^^^^^ + +.. code-block:: sql + + FROM T1 CROSS JOIN T2 + +Cross join, also called *Cartesian product*, results in every possible combination of rows from T1 and T2. + +``FROM T1 CROSS JOIN T2`` is equivalent to ``FROM T1, T2``. + +Qualified joins +^^^^^^^^^^^^^^^ + +Qualified joins implicitly or explicitly have join conditions. Inner/Outer/Natural Joins all are qualified joins. +Except for natural join, ``ON`` or ``USING`` clause in each join is used to specify a join condition. +A join condition must include at least one boolean expression, and it can also include just filter conditions. + +**Inner Join** + +.. code-block:: sql + + T1 [INNER] JOIN T2 ON boolean_expression + T1 [INNER] JOIN T2 USING (join column list) + +``INNER`` keyword is the default, and so ``INNER`` can be omitted when you use inner join. + +**Outer Join** + +.. code-block:: sql + + T1 (LEFT|RIGHT|FULL) OUTER JOIN T2 ON boolean_expression + T1 (LEFT|RIGHT|FULL) OUTER JOIN T2 USING (join column list) + +One of ``LEFT``, ``RIGHT``, or ``FULL`` must be specified for outer joins. +Join conditions in outer join will have different behavior according to corresponding table references of join conditions. +To know outer join behavior in more detail, please refer to +`Advanced outer join constructs `_. + +**Natural Join** + +.. code-block:: sql + + T1 NATURAL JOIN T2 + +``NATURAL`` is a short form of ``USING``. It forms a ``USING`` list consisting of all common column names that appear in +both join tables. These common columns appear only once in the output table. If there are no common columns, +``NATURAL`` behaves like ``CROSS JOIN``. + +**Subqueries** + +Subqueries allow users to specify a derived table. It requires enclosing a SQL statement in parentheses and an alias name. +For example: + +.. code-block:: sql + + FROM (SELECT * FROM table1) AS alias_name + +===================== +Where Clause +===================== + +The syntax of the WHERE Clause is + +*Synopsis* + +.. code-block:: sql + + WHERE search_condition + +``search_condition`` can be any boolean expression. +In order to know additional predicates, please refer to :doc:`/sql_language/predicates`. + +========================== +Groupby and Having Clauses +========================== + +*Synopsis* + +.. code-block:: sql + + SELECT select_list + FROM ... + [WHERE ...] + GROUP BY grouping_column_reference [, grouping_column_reference]... + [HAVING boolean_expression] + +The rows which passes ``WHERE`` filter may be subject to grouping, specified by ``GROUP BY`` clause. +Grouping combines a set of rows having common values into one group, and then computes rows in the group with aggregation functions. ``HAVING`` clause can be used with only ``GROUP BY`` clause. It eliminates the unqualified result rows of grouping. + +``grouping_column_reference`` can be a column reference, a complex expression including scalar functions and arithmetic operations. + +.. code-block:: sql + + SELECT l_orderkey, SUM(l_quantity) AS quantity FROM lineitem GROUP BY l_orderkey; + + SELECT substr(l_shipdate,1,4) as year, SUM(l_orderkey) AS total2 FROM lineitem GROUP BY substr(l_shipdate,1,4); + +If a SQL statement includes ``GROUP BY`` clause, expressions in select list must be either grouping_column_reference or aggregation function. For example, the following example query is not allowed because ``l_orderkey`` does not occur in ``GROUP BY`` clause. + +.. code-block:: sql + + SELECT l_orderkey, l_partkey, SUM(l_orderkey) AS total FROM lineitem GROUP BY l_partkey; + +Aggregation functions can be used with ``DISTINCT`` keywords. It forces an individual aggregate function to take only distinct values of the argument expression. ``DISTINCT`` keyword is used as follows: + +.. code-block:: sql + + SELECT l_partkey, COUNT(distinct l_quantity), SUM(distinct l_extendedprice) AS total FROM lineitem GROUP BY l_partkey; + +========================== +Orderby and Limit Clauses +========================== + +*Synopsis* + +.. code-block:: sql + + FROM ... ORDER BY [(ASC|DESC)] [NULL (FIRST|LAST) [,...] + +``sort_expr`` can be a column reference, aliased column reference, or a complex expression. +``ASC`` indicates an ascending order of ``sort_expr`` values. ``DESC`` indicates a descending order of ``sort_expr`` values. +``ASC`` is the default order. + +``NULLS FIRST`` and ``NULLS LAST`` options can be used to determine whether nulls values appear +before or after non-null values in the sort ordering. By default, null values are dealt as if larger than any non-null value; +that is, ``NULLS FIRST`` is the default for ``DESC`` order, and ``NULLS LAST`` otherwise. + +========================== +Window Functions +========================== + +A window function performs a calculation across multiple table rows that belong to some window frame. + +*Synopsis* + +.. code-block:: sql + + SELECT ...., func(param) OVER ([PARTITION BY partition-expr [, ...]] [ORDER BY sort-expr [, ...]]), ...., FROM + +The PARTITION BY list within OVER specifies dividing the rows into groups, or partitions, that share the same values of +the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into +the same partition as the current row. + +We will briefly explain some examples using window functions. + +--------- +Examples +--------- + +Multiple window functions can be used in a SQL statement as follows: + +.. code-block:: sql + + SELECT l_orderkey, sum(l_discount) OVER (PARTITION BY l_orderkey), sum(l_quantity) OVER (PARTITION BY l_orderkey) FROM LINEITEM; + +If ``OVER()`` clause is empty as following, it makes all table rows into one window frame. + +.. code-block:: sql + + SELECT salary, sum(salary) OVER () FROM empsalary; + +Also, ``ORDER BY`` clause can be used without ``PARTITION BY`` clause as follows: + +.. code-block:: sql + + SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary; + +Also, all expressions and aggregation functions are allowed in ``ORDER BY`` clause as follows: + +.. code-block:: sql + + select + l_orderkey, + count(*) as cnt, + row_number() over (partition by l_orderkey order by count(*) desc) + row_num + from + lineitem + group by + l_orderkey + +.. note:: + + Currently, Tajo does not support multiple different partition-expressions in one SQL statement. \ No newline at end of file Added: tajo/site/docs/devel/_sources/sql_language/sql_expression.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/sql_language/sql_expression.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/sql_language/sql_expression.txt (added) +++ tajo/site/docs/devel/_sources/sql_language/sql_expression.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,31 @@ +============================ + SQL Expressions +============================ + +------------------------- + Arithmetic Expressions +------------------------- + +------------------------- +Type Casts +------------------------- +A type cast converts a specified-typed data to another-typed data. Tajo has two type cast syntax: + +.. code-block:: sql + + CAST ( expression AS type ) + expression::type + + +------------------------- +String Expressions +------------------------- + + +------------------------- +Function Call +------------------------- + +.. code-block:: sql + + function_name ([expression [, expression ... ]] ) \ No newline at end of file Added: tajo/site/docs/devel/_sources/table_management.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/table_management.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/table_management.txt (added) +++ tajo/site/docs/devel/_sources/table_management.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,12 @@ +****************** +Table Management +****************** + +In Tajo, a table is a logical view of one data sources. Logically, one table consists of a logical schema, partitions, URL, and various properties. Physically, A table can be a directory in HDFS, a single file, one HBase table, or a RDBMS table. In order to make good use of Tajo, users need to understand features and physical characteristics of their physical layout. This section explains all about table management. + +.. toctree:: + :maxdepth: 1 + + table_management/table_overview + table_management/file_formats + table_management/compression \ No newline at end of file Added: tajo/site/docs/devel/_sources/table_management/compression.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/table_management/compression.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/table_management/compression.txt (added) +++ tajo/site/docs/devel/_sources/table_management/compression.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,5 @@ +********************************* +Compression +********************************* + +.. todo:: \ No newline at end of file Added: tajo/site/docs/devel/_sources/table_management/csv.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/table_management/csv.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/table_management/csv.txt (added) +++ tajo/site/docs/devel/_sources/table_management/csv.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,115 @@ +************************************* +CSV (TextFile) +************************************* + +A character-separated values (CSV) file represents a tabular data set consisting of rows and columns. +Each row is a plan-text line. A line is usually broken by a character line feed ``\n`` or carriage-return ``\r``. +The line feed ``\n`` is the default delimiter in Tajo. Each record consists of multiple fields, separated by +some other character or string, most commonly a literal vertical bar ``|``, comma ``,`` or tab ``\t``. +The vertical bar is used as the default field delimiter in Tajo. + +========================================= +How to Create a CSV Table ? +========================================= + +If you are not familiar with the ``CREATE TABLE`` statement, please refer to the Data Definition Language :doc:`/sql_language/ddl`. + +In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE`` +statement. The below is an example statement for creating a table using CSV files. + +.. code-block:: sql + + CREATE TABLE + table1 ( + id int, + name text, + score float, + type text + ) USING CSV; + +========================================= +Physical Properties +========================================= + +Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters. +The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters. + +Now, the CSV storage format provides the following physical properties. + +* ``text.delimiter``: delimiter character. ``|`` or ``\u0001`` is usually used, and the default field delimiter is ``|``. +* ``text.null``: NULL character. The default NULL character is an empty string ``''``. Hive's default NULL character is ``'\\N'``. +* ``compression.codec``: Compression codec. You can enable compression feature and set specified compression algorithm. The compression algorithm used to compress files. The compression codec name should be the fully qualified class name inherited from `org.apache.hadoop.io.compress.CompressionCodec `_. By default, compression is disabled. +* ``csvfile.serde`` (deprecated): custom (De)serializer class. ``org.apache.tajo.storage.TextSerializerDeserializer`` is the default (De)serializer class. +* ``timezone``: the time zone that the table uses for writting. When table rows are read or written, ```timestamp``` and ```time``` column values are adjusted by this timezone if it is set. Time zone can be an abbreviation form like 'PST' or 'DST'. Also, it accepts an offset-based form like 'UTC+9' or a location-based form like 'Asia/Seoul'. +* ``text.error-tolerance.max-num``: the maximum number of permissible parsing errors. This value should be an integer value. By default, ``text.error-tolerance.max-num`` is ``0``. According to the value, parsing errors will be handled in different ways. + * If ``text.error-tolerance.max-num < 0``, all parsing errors are ignored. + * If ``text.error-tolerance.max-num == 0``, any parsing error is not allowed. If any error occurs, the query will be failed. (default) + * If ``text.error-tolerance.max-num > 0``, the given number of parsing errors in each task will be pemissible. + +The following example is to set a custom field delimiter, NULL character, and compression codec: + +.. code-block:: sql + + CREATE TABLE table1 ( + id int, + name text, + score float, + type text + ) USING CSV WITH('text.delimiter'='\u0001', + 'text.null'='\\N', + 'compression.codec'='org.apache.hadoop.io.compress.SnappyCodec'); + +.. warning:: + + Be careful when using ``\n`` as the field delimiter because CSV uses ``\n`` as the line delimiter. + At the moment, Tajo does not provide a way to specify the line delimiter. + +========================================= +Custom (De)serializer +========================================= + +The CSV storage format not only provides reading and writing interfaces for CSV data but also allows users to process custom +plan-text file formats with user-defined (De)serializer classes. +For example, with custom (de)serializers, Tajo can process JSON file formats or any specialized plan-text file formats. + +In order to specify a custom (De)serializer, set a physical property ``csvfile.serde``. +The property value should be a fully qualified class name. + +For example: + +.. code-block:: sql + + CREATE TABLE table1 ( + id int, + name text, + score float, + type text + ) USING CSV WITH ('csvfile.serde'='org.my.storage.CustomSerializerDeserializer') + + +========================================= +Null Value Handling Issues +========================================= +In default, NULL character in CSV files is an empty string ``''``. +In other words, an empty field is basically recognized as a NULL value in Tajo. +If a field domain is ``TEXT``, an empty field is recognized as a string value ``''`` instead of NULL value. +Besides, You can also use your own NULL character by specifying a physical property ``text.null``. + +========================================= +Compatibility Issues with Apache Hive™ +========================================= + +CSV files generated in Tajo can be processed directly by Apache Hive™ without further processing. +In this section, we explain some compatibility issue for users who use both Hive and Tajo. + +If you set a custom field delimiter, the CSV tables cannot be directly used in Hive. +In order to specify the custom field delimiter in Hive, you need to use ``ROW FORMAT DELIMITED FIELDS TERMINATED BY`` +clause in a Hive's ``CREATE TABLE`` statement as follows: + +.. code-block:: sql + + CREATE TABLE table1 (id int, name string, score float, type string) + ROW FORMAT DELIMITED FIELDS TERMINATED BY '|' + STORED AS TEXTFILE + +To the best of our knowledge, there is not way to specify a custom NULL character in Hive. \ No newline at end of file Added: tajo/site/docs/devel/_sources/table_management/file_formats.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/table_management/file_formats.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/table_management/file_formats.txt (added) +++ tajo/site/docs/devel/_sources/table_management/file_formats.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,13 @@ +************************************* +File Formats +************************************* + +Currently, Tajo provides four file formats as follows: + +.. toctree:: + :maxdepth: 1 + + csv + rcfile + parquet + sequencefile \ No newline at end of file Added: tajo/site/docs/devel/_sources/table_management/parquet.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/table_management/parquet.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/table_management/parquet.txt (added) +++ tajo/site/docs/devel/_sources/table_management/parquet.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,48 @@ +************************************* +Parquet +************************************* + +Parquet is a columnar storage format for Hadoop. Parquet is designed to make the advantages of compressed, +efficient columnar data representation available to any project in the Hadoop ecosystem, +regardless of the choice of data processing framework, data model, or programming language. +For more details, please refer to `Parquet File Format `_. + +========================================= +How to Create a Parquet Table? +========================================= + +If you are not familiar with ``CREATE TABLE`` statement, please refer to Data Definition Language :doc:`/sql_language/ddl`. + +In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE`` +statement. Below is an example statement for creating a table using parquet files. + +.. code-block:: sql + + CREATE TABLE table1 ( + id int, + name text, + score float, + type text + ) USING PARQUET; + +========================================= +Physical Properties +========================================= + +Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters. +The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters. + +Now, Parquet file provides the following physical properties. + +* ``parquet.block.size``: The block size is the size of a row group being buffered in memory. This limits the memory usage when writing. Larger values will improve the I/O when reading but consume more memory when writing. Default size is 134217728 bytes (= 128 * 1024 * 1024). +* ``parquet.page.size``: The page size is for compression. When reading, each page can be decompressed independently. A block is composed of pages. The page is the smallest unit that must be read fully to access a single record. If this value is too small, the compression will deteriorate. Default size is 1048576 bytes (= 1 * 1024 * 1024). +* ``parquet.compression``: The compression algorithm used to compress pages. It should be one of ``uncompressed``, ``snappy``, ``gzip``, ``lzo``. Default is ``uncompressed``. +* ``parquet.enable.dictionary``: The boolean value is to enable/disable dictionary encoding. It should be one of either ``true`` or ``false``. Default is ``true``. + +========================================= +Compatibility Issues with Apache Hive™ +========================================= + +At the moment, Tajo only supports flat relational tables. +As a result, Tajo's Parquet storage type does not support nested schemas. +However, we are currently working on adding support for nested schemas and non-scalar types (`TAJO-710 `_). \ No newline at end of file Added: tajo/site/docs/devel/_sources/table_management/rcfile.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/table_management/rcfile.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/table_management/rcfile.txt (added) +++ tajo/site/docs/devel/_sources/table_management/rcfile.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,149 @@ +************************************* +RCFile +************************************* + +RCFile, short of Record Columnar File, are flat files consisting of binary key/value pairs, +which shares many similarities with SequenceFile. + +========================================= +How to Create a RCFile Table? +========================================= + +If you are not familiar with the ``CREATE TABLE`` statement, please refer to the Data Definition Language :doc:`/sql_language/ddl`. + +In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE`` +statement. Below is an example statement for creating a table using RCFile. + +.. code-block:: sql + + CREATE TABLE table1 ( + id int, + name text, + score float, + type text + ) USING RCFILE; + +========================================= +Physical Properties +========================================= + +Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters. +The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters. + +Now, the RCFile storage type provides the following physical properties. + +* ``rcfile.serde`` : custom (De)serializer class. ``org.apache.tajo.storage.BinarySerializerDeserializer`` is the default (de)serializer class. +* ``rcfile.null`` : NULL character. It is only used when a table uses ``org.apache.tajo.storage.TextSerializerDeserializer``. The default NULL character is an empty string ``''``. Hive's default NULL character is ``'\\N'``. +* ``compression.codec`` : Compression codec. You can enable compression feature and set specified compression algorithm. The compression algorithm used to compress files. The compression codec name should be the fully qualified class name inherited from `org.apache.hadoop.io.compress.CompressionCodec `_. By default, compression is disabled. + +The following is an example for creating a table using RCFile that uses compression. + +.. code-block:: sql + + CREATE TABLE table1 ( + id int, + name text, + score float, + type text + ) USING RCFILE WITH ('compression.codec'='org.apache.hadoop.io.compress.SnappyCodec'); + +========================================= +RCFile (De)serializers +========================================= + +Tajo provides two built-in (De)serializer for RCFile: + +* ``org.apache.tajo.storage.TextSerializerDeserializer``: stores column values in a plain-text form. +* ``org.apache.tajo.storage.BinarySerializerDeserializer``: stores column values in a binary file format. + +The RCFile format can store some metadata in the RCFile header. Tajo writes the (de)serializer class name into +the metadata header of each RCFile when the RCFile is created in Tajo. + +.. note:: + + ``org.apache.tajo.storage.BinarySerializerDeserializer`` is the default (de) serializer for RCFile. + + +========================================= +Compatibility Issues with Apache Hive™ +========================================= + +Regardless of whether the RCFiles are written by Apache Hive™ or Apache Tajo™, the files are compatible in both systems. +In other words, Tajo can process RCFiles written by Apache Hive and vice versa. + +Since there are no metadata in RCFiles written by Hive, we need to manually specify the (de)serializer class name +by setting a physical property. + +In Hive, there are two SerDe, and they correspond to the following (de)serializer in Tajo. + +* ``org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe``: corresponds to ``TextSerializerDeserializer`` in Tajo. +* ``org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe``: corresponds to ``BinarySerializerDeserializer`` in Tajo. + +The compatibility issue mostly occurs when a user creates an external table pointing to data of an existing table. +The following section explains two cases: 1) the case where Tajo reads RCFile written by Hive, and +2) the case where Hive reads RCFile written by Tajo. + +----------------------------------------- +When Tajo reads RCFile generated in Hive +----------------------------------------- + +To create an external RCFile table generated with ``ColumnarSerDe`` in Hive, +you should set the physical property ``rcfile.serde`` in Tajo as follows: + +.. code-block:: sql + + CREATE EXTERNAL TABLE table1 ( + id int, + name text, + score float, + type text + ) USING RCFILE with ( 'rcfile.serde'='org.apache.tajo.storage.TextSerializerDeserializer', 'rcfile.null'='\\N' ) + LOCATION '....'; + +To create an external RCFile table generated with ``LazyBinaryColumnarSerDe`` in Hive, +you should set the physical property ``rcfile.serde`` in Tajo as follows: + +.. code-block:: sql + + CREATE EXTERNAL TABLE table1 ( + id int, + name text, + score float, + type text + ) USING RCFILE WITH ('rcfile.serde' = 'org.apache.tajo.storage.BinarySerializerDeserializer') + LOCATION '....'; + +.. note:: + + As we mentioned above, ``BinarySerializerDeserializer`` is the default (de) serializer for RCFile. + So, you can omit the ``rcfile.serde`` only for ``org.apache.tajo.storage.BinarySerializerDeserializer``. + +----------------------------------------- +When Hive reads RCFile generated in Tajo +----------------------------------------- + +To create an external RCFile table written by Tajo with ``TextSerializerDeserializer``, +you should set the ``SERDE`` as follows: + +.. code-block:: sql + + CREATE TABLE table1 ( + id int, + name string, + score float, + type string + ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS RCFILE + LOCATION ''; + +To create an external RCFile table written by Tajo with ``BinarySerializerDeserializer``, +you should set the ``SERDE`` as follows: + +.. code-block:: sql + + CREATE TABLE table1 ( + id int, + name string, + score float, + type string + ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' STORED AS RCFILE + LOCATION ''; \ No newline at end of file Added: tajo/site/docs/devel/_sources/table_management/sequencefile.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/table_management/sequencefile.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/table_management/sequencefile.txt (added) +++ tajo/site/docs/devel/_sources/table_management/sequencefile.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,111 @@ +************************************* +SequenceFile +************************************* + +----------------------------------------- +Introduce +----------------------------------------- + +SequenceFiles are flat files consisting of binary key/value pairs. +SequenceFile is basic file format which provided by Hadoop, and Hive also provides it to create a table. + +The ``USING sequencefile`` keywords let you create a SequecneFile. Here is an example statement to create a table using ``SequecneFile``: + +.. code-block:: sql + + CREATE TABLE table1 (id int, name text, score float, type text) + USING sequencefile; + +Also Tajo provides Hive compatibility for SequenceFile. The above statement can be written in Hive as follows: + +.. code-block:: sql + + CREATE TABLE table1 (id int, name string, score float, type string) + STORED AS sequencefile; + +----------------------------------------- +SerializerDeserializer (SerDe) +----------------------------------------- + +There are two SerDe for SequenceFile as follows: + + + TextSerializerDeserializer: This class can read and write data in plain text file format. + + BinarySerializerDeserializer: This class can read and write data in binary file format. + +The default is the SerDe for plain text file in Tajo. The above example statement created the table using TextSerializerDeserializer.If you want to use BinarySerializerDeserializer, you can specify it by ``sequencefile.serde`` keywords: + +.. code-block:: sql + + CREATE TABLE table1 (id int, name text, score float, type text) + USING sequencefile with ('sequencefile.serde'='org.apache.tajo.storage.BinarySerializerDeserializer') + +In Hive, the above statement can be written in Hive as follows: + +.. code-block:: sql + + CREATE TABLE table1 (id int, name string, score float, type string) + ROW FORMAT SERDE + 'org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe' + STORED AS sequencefile; + +----------------------------------------- +Writer +----------------------------------------- + +There are three SequenceFile Writers based on the SequenceFile.CompressionType used to compress key/value pairs: + + + Writer : Uncompressed records. + + RecordCompressWriter : Record-compressed files, only compress values. + + BlockCompressWriter : Block-compressed files, both keys & values are collected in 'blocks' separately and compressed. The size of the 'block' is configurable. + +The default is Uncompressed Writer in Tajo. If you want to use RecordCompressWriter, you can specify it by ``compression.type`` keywords and ``compression.codec`` keywords: + +.. code-block:: sql + + CREATE TABLE table1 (id int, name text, score float, type text) + USING sequencefile with ('compression.type'='RECORD','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec') + +In hive, you need to specify settings as follows: + +.. code-block:: sql + + hive> SET hive.exec.compress.output = true; + hive> SET mapred.output.compression.type = RECORD; + hive> SET mapred.output.compression.codec = org.apache.hadoop.io.compress.SnappyCodec; + hive> CREATE TABLE table1 (id int, name string, score float, type string) STORED AS sequencefile;; + +And if you want to use BlockCompressWriter, you can specify it by ``compression.type`` keywords and ``compression.codec`` keywords: + +.. code-block:: sql + + CREATE TABLE table1 (id int, name text, score float, type text) + USING sequencefile with ('compression.type'='BLOCK','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec') + +In hive, you need to specify settings as follows: + +.. code-block:: sql + + hive> SET hive.exec.compress.output = true; + hive> SET mapred.output.compression.type = BLOCK; + hive> SET mapred.output.compression.codec = org.apache.hadoop.io.compress.SnappyCodec; + hive> CREATE TABLE table1 (id int, name string, score float, type string) STORED AS sequencefile;; + +For reference, you can use TextSerDe or BinarySerDe with compression keywords. +Here is an example statement for this case. + +.. code-block:: sql + + CREATE TABLE table1 (id int, name text, score float, type text) + USING sequencefile with ('sequencefile.serde'='org.apache.tajo.storage.BinarySerializerDeserializer', 'compression.type'='BLOCK','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec') + +In hive, you need to specify settings as follows: + +.. code-block:: sql + + hive> SET hive.exec.compress.output = true; + hive> SET mapred.output.compression.type = BLOCK; + hive> SET mapred.output.compression.codec = org.apache.hadoop.io.compress.SnappyCodec; + hive> CREATE TABLE table1 (id int, name string, score float, type string) + ROW FORMAT SERDE + 'org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe' + STORED AS sequencefile;; \ No newline at end of file Added: tajo/site/docs/devel/_sources/table_management/table_overview.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/table_management/table_overview.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/table_management/table_overview.txt (added) +++ tajo/site/docs/devel/_sources/table_management/table_overview.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,98 @@ +************************************* +Overview of Tajo Tables +************************************* + +Overview +======== + +.. todo:: + +Table Properties +================ +All table formats provide parameters for enabling or disabling features and adjusting physical parameters. +The ``WITH`` clause in the CREATE TABLE statement allows users to set those properties. + +The following example is to set a custom field delimiter, NULL character, and compression codec: + +.. code-block:: sql + + CREATE TABLE table1 ( + id int, + name text, + score float, + type text + ) USING CSV WITH('text.delimiter'='\u0001', + 'text.null'='\\N', + 'compression.codec'='org.apache.hadoop.io.compress.SnappyCodec'); + +Each physical table layout has its own specialized properties. They will be addressed in :doc:`/table_management/file_formats`. + + +Common Table Properties +======================= + +There are some common table properties which are used in most tables. + +Compression +----------- +.. todo:: + +Time zone +--------- +In Tajo, a table property ``timezone`` allows users to specify a time zone that the table uses for reading or writing. +When each table row are read or written, ```timestamp``` and ```time``` column values are adjusted by a given time zone if it is set. Time zone can be an abbreviation form like 'PST' or 'DST'. Also, it accepts an offset-based form like 'GMT+9' or UTC+9' or a location-based form like 'Asia/Seoul'. + +Each table has one time zone, and many tables can have different time zones. Internally, Tajo translates all tables data to offset-based values. So, complex queries like join with multiple time zones work well. + +.. note:: + + In many cases, offset-based forms or locaion-based forms are recommanded. In order to know the list of time zones, please refer to `List of tz database time zones `_ + +.. note:: + + Java 6 does not recognize many location-based time zones and an offset-based timezone using the prefix 'UTC'. We highly recommanded using the offset-based time zone using the prefix 'GMT'. In other words, you should use 'GMT-7' instead of 'UTC-7' in Java 6. + +How time zone works in Tajo +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For example, consider that there is a list of delimited text lines where each rows are written with ``Asia/Seoul`` time zone (i.e., GMT + 9). + +.. code-block:: text + + 1980-4-1 01:50:30.010|1980-04-01 + 80/4/1 1:50:30 AM|80/4/1 + 1980 April 1 1:50:30|1980-04-01 + + +In order to register the table, we should put a table property ``'timezone'='Asia/Seoul'`` in ``CREATE TABLE`` statement as follows: + +.. code-block:: sql + + CREATE EXTERNAL TABLE table1 ( + t_timestamp TIMESTAMP, + t_date DATE + ) USING TEXTFILE WITH('text.delimiter'='|', 'timezone'='ASIA/Seoul') LOCATION '/path-to-table/' + + +By default, ``tsql`` and ``TajoClient`` API use UTC time zone. So, timestamp values in the result are adjusted by the time zone offset. But, date is not adjusted because date type does not consider time zone. + +.. code-block:: sql + + default> SELECT * FROM table1 + t_timestamp, t_date + ---------------------------------- + 1980-03-31 16:50:30.01, 1980-04-01 + 1980-03-31 16:50:30 , 1980-04-01 + 1980-03-31 16:50:30 , 1980-04-01 + +In addition, users can set client-side time zone by setting a session variable 'TZ'. It enables a client to translate timestamp or time values to user's time zoned ones. + +.. code-block:: sql + + default> \set TZ 'Asia/Seoul' + default> SELECT * FROM table1 + t_timestamp, t_date + ---------------------------------- + 1980-04-01 01:50:30.01, 1980-04-01 + 1980-04-01 01:50:30 , 1980-04-01 + 1980-04-01 01:50:30 , 1980-04-01 \ No newline at end of file Added: tajo/site/docs/devel/_sources/table_partitioning.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/table_partitioning.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/table_partitioning.txt (added) +++ tajo/site/docs/devel/_sources/table_partitioning.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,11 @@ +****************** +Table Partitioning +****************** + +.. toctree:: + :maxdepth: 1 + + partitioning/intro_to_partitioning + partitioning/column_partitioning + partitioning/range_partitioning + partitioning/hash_partitioning \ No newline at end of file Added: tajo/site/docs/devel/_sources/tajo_client_api.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/tajo_client_api.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/tajo_client_api.txt (added) +++ tajo/site/docs/devel/_sources/tajo_client_api.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,5 @@ +************************************* +Tajo Client API +************************************* + +.. todo:: \ No newline at end of file Added: tajo/site/docs/devel/_sources/tsql.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/tsql.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/tsql.txt (added) +++ tajo/site/docs/devel/_sources/tsql.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,19 @@ +***************************** +Tajo Shell (TSQL) +***************************** + +Tajo provides a shell utility named Tsql. It is a command-line interface (CLI) where users can create or drop tables, inspect schema and query tables, etc. + +.. toctree:: + :maxdepth: 1 + + tsql/meta_command + tsql/dfs_command + tsql/variables + tsql/admin_command + + + tsql/intro + tsql/single_command + tsql/execute_file + tsql/background_command \ No newline at end of file Added: tajo/site/docs/devel/_sources/tsql/admin_command.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/tsql/admin_command.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/tsql/admin_command.txt (added) +++ tajo/site/docs/devel/_sources/tsql/admin_command.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,60 @@ +********************************* +Administration Commands +********************************* + + +========== +Synopsis +========== + +Tsql provides administration commands as follows: + +.. code-block:: sql + + default> \admin; + usage: admin [options] + -cluster Show Cluster Info + -desc Show Query Description + -h,--host Tajo server host + -kill Kill a running query + -list Show Tajo query list + -p,--port Tajo server port + -showmasters gets list of tajomasters in the cluster + + +----------------------------------------------- +Basic usages +----------------------------------------------- + +``-list`` option shows a list of all running queries as follows: :: + + default> \admin -list + QueryId State StartTime Query + -------------------- ------------------- ------------------- ----------------------------- + q_1411357607375_0006 QUERY_RUNNING 2014-09-23 07:19:40 select count(*) from lineitem + + +``-desc`` option shows a detailed description of a specified running query as follows: :: + + default> \admin -desc q_1411357607375_0006 + Id: 1 + Query Id: q_1411357607375_0006 + Started Time: 2014-09-23 07:19:40 + Query State: QUERY_RUNNING + Execution Time: 20.0 sec + Query Progress: 0.249 + Query Statement: + select count(*) from lineitem + + +``-kill`` option kills a specified running query as follows: :: + + default> \admin -kill q_1411357607375_0007 + q_1411357607375_0007 is killed successfully. + + + +``-showmasters`` option shows a list of all tajo masters as follows: :: + + default> \admin -showmasters + grtajo01 Added: tajo/site/docs/devel/_sources/tsql/background_command.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/tsql/background_command.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/tsql/background_command.txt (added) +++ tajo/site/docs/devel/_sources/tsql/background_command.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,29 @@ +********************************* +Executing as background process +********************************* + + +If you execute tsql as a background process, tsql will exit before executing a query because of some limitation of Jline2. + +Example: + + .. code-block:: sql + + $ bin/tsql -f aggregation.sql & + [1] 19303 + $ + [1]+ Stopped ./bin/tsql -f aggregation.sql + + +To avoid above problem, Tajo provides the -B command as follows: + +.. code-block:: sql + + $ bin/tsql -B -f aggregation.sql & + [2] 19419 + Progress: 0%, response time: 0.218 sec + Progress: 0%, response time: 0.22 sec + Progress: 0%, response time: 0.421 sec + Progress: 0%, response time: 0.823 sec + Progress: 0%, response time: 1.425 sec + Progress: 1%, response time: 2.227 sec Added: tajo/site/docs/devel/_sources/tsql/dfs_command.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/tsql/dfs_command.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/tsql/dfs_command.txt (added) +++ tajo/site/docs/devel/_sources/tsql/dfs_command.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,26 @@ +********************************* +Executing HDFS commands +********************************* + +You can run the hadoop dfs command (FsShell) within tsql. ``\dfs`` command provides a shortcut to the hadoop dfs commands. If you want to use this command, just specify FsShell arguments and add the semicolon at the end as follows: + +.. code-block:: sql + + default> \dfs -ls / + Found 3 items + drwxr-xr-x - tajo supergroup 0 2014-08-14 04:04 /tajo + drwxr-xr-x - tajo supergroup 0 2014-09-04 02:20 /tmp + drwxr-xr-x - tajo supergroup 0 2014-09-16 13:41 /user + + default> \dfs -ls /tajo + Found 2 items + drwxr-xr-x - tajo supergroup 0 2014-08-14 04:04 /tajo/system + drwxr-xr-x - tajo supergroup 0 2014-08-14 04:15 /tajo/warehouse + + default> \dfs -mkdir /tajo/temp + + default> \dfs -ls /tajo + Found 3 items + drwxr-xr-x - tajo supergroup 0 2014-08-14 04:04 /tajo/system + drwxr-xr-x - tajo supergroup 0 2014-09-23 06:48 /tajo/temp + drwxr-xr-x - tajo supergroup 0 2014-08-14 04:15 /tajo/warehouse Added: tajo/site/docs/devel/_sources/tsql/execute_file.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/tsql/execute_file.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/tsql/execute_file.txt (added) +++ tajo/site/docs/devel/_sources/tsql/execute_file.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,63 @@ +********************************* +Executing Queries from Files +********************************* + + +----------------------------------------------- +Basic usages +----------------------------------------------- + + +``-f`` command allows tsql to execute more than one SQL statements stored in a text file as follows: + +.. code-block:: sql + + $ cat aggregation.sql + select count(*) from table1; + select sum(score) from table1; + + $ bin/tsql -f aggregation.sql + Progress: 0%, response time: 0.216 sec + Progress: 0%, response time: 0.217 sec + Progress: 100%, response time: 0.331 sec + ?count + ------------------------------- + 5 + (1 rows, 0.331 sec, 2 B selected) + Progress: 0%, response time: 0.203 sec + Progress: 0%, response time: 0.204 sec + Progress: 50%, response time: 0.406 sec + Progress: 100%, response time: 0.769 sec + ?sum + ------------------------------- + 15.0 + (1 rows, 0.769 sec, 5 B selected) + + + +----------------------------------------------- +Setting parameter value in SQL file +----------------------------------------------- + +If you wish to set a parameter value in the SQL file, you can set with the -param key=value option. When you use this feature, you have to use the parameter in the file as follows: + +.. code-block:: sql + + ${paramter name} + + +You have to put the parameter name in braces and you must use the $ symbol for the prefix as follows: + +.. code-block:: sql + + $ cat aggregation.sql + select count(*) from table1 where id = ${p_id}; + + $ bin/tsql -param p_id=1 -f aggregation.sql + Progress: 0%, response time: 0.216 sec + Progress: 0%, response time: 0.217 sec + Progress: 100%, response time: 0.331 sec + ?count + ------------------------------- + 1 + (1 rows, 0.331 sec, 2 B selected) Added: tajo/site/docs/devel/_sources/tsql/intro.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/tsql/intro.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/tsql/intro.txt (added) +++ tajo/site/docs/devel/_sources/tsql/intro.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,41 @@ +***************************** +Introducing to TSQL +***************************** + +========== +Synopsis +========== + +.. code-block:: bash + + bin/tsql [options] [database name] + +If a *database_name* is given, tsql connects to the database at startup time. Otherwise, tsql connects to ``default`` database. + +Options + +* ``-c "quoted sql"`` : Execute quoted sql statements, and then the shell will exist. +* ``-f filename (--file filename)`` : Use the file named filename as the source of commands instead of interactive shell. +* ``-h hostname (--host hostname)`` : Specifies the host name of the machine on which the Tajo master is running. +* ``-p port (--port port)`` : Specifies the TCP port. If it is not set, the port will be 26002 by default. +* ``-conf configuration (--conf configuration)`` : Setting Tajo configuration value. +* ``-param parameter (--param parameter)`` : Use a parameter value in SQL file. +* ``-B (--background)`` : Execute as background process. + +=================== +Entering tsql shell +=================== + +If the hostname and the port num are not given, tsql will try to connect the Tajo master specified in ${TAJO_HOME}/conf/tajo-site.xml. :: + + bin/tsql + + default> + +If you want to connect a specified TajoMaster, you should use '-h' and (or) 'p' options as follows: :: + + bin/tsql -h localhost -p 9004 + + default> + +The prompt indicates the current database. Added: tajo/site/docs/devel/_sources/tsql/meta_command.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/tsql/meta_command.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/tsql/meta_command.txt (added) +++ tajo/site/docs/devel/_sources/tsql/meta_command.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,150 @@ +********************************* +Meta Commands +********************************* + + +In tsql, any command that begins with an unquoted backslash ('\') is a tsql meta-command that is processed by tsql itself. + +In the current implementation, there are meta commands as follows: :: + + default> \? + + + General + \copyright show Apache License 2.0 + \version show Tajo version + \? show help + \? [COMMAND] show help of a given command + \help alias of \? + \q quit tsql + + + Informational + \l list databases + \c show current database + \c [DBNAME] connect to new database + \d list tables + \d [TBNAME] describe table + \df list functions + \df NAME describe function + + + Tool + \! execute a linux shell command + \dfs execute a dfs command + \admin execute Tajo admin command + + + Variables + \set [[NAME] [VALUE] set session variable or list session variables + \unset NAME unset session variable + + + Documentations + tsql guide http://tajo.apache.org/docs/current/cli.html + Query language http://tajo.apache.org/docs/current/sql_language.html + Functions http://tajo.apache.org/docs/current/functions.html + Backup & restore http://tajo.apache.org/docs/current/backup_and_restore.html + Configuration http://tajo.apache.org/docs/current/configuration.html + +----------------------------------------------- +Basic usages +----------------------------------------------- + +``\l`` command shows a list of all databases as follows: :: + + default> \l + default + tpch + work1 + default> + + + +``\d`` command shows a list of tables in the current database as follows: :: + + default> \d + customer + lineitem + nation + orders + part + partsupp + region + supplier + + +``\d [table name]`` command also shows a table description as follows: :: + + default> \d orders + + table name: orders + table path: hdfs:/xxx/xxx/tpch/orders + store type: CSV + number of rows: 0 + volume (bytes): 172.0 MB + schema: + o_orderkey INT8 + o_custkey INT8 + o_orderstatus TEXT + o_totalprice FLOAT8 + o_orderdate TEXT + o_orderpriority TEXT + o_clerk TEXT + o_shippriority INT4 + o_comment TEXT + + + +The prompt ``default>`` indicates the current database. Basically, all SQL statements and meta commands work in the current database. Also, you can change the current database with ``\c`` command. + +.. code-block:: sql + + default> \c work1 + You are now connected to database "test" as user "hyunsik". + work1> + + +``\df`` command shows a list of all built-in functions as follows: :: + + default> \df + Name | Result type | Argument types | Description | Type + -----------------+-----------------+-----------------------+-----------------------------------------------+----------- + abs | INT4 | INT4 | Absolute value | GENERAL + abs | INT8 | INT8 | Absolute value | GENERAL + abs | FLOAT4 | FLOAT4 | Absolute value | GENERAL + abs | FLOAT8 | FLOAT8 | Absolute value | GENERAL + acos | FLOAT8 | FLOAT4 | Inverse cosine. | GENERAL + acos | FLOAT8 | FLOAT8 | Inverse cosine. | GENERAL + utc_usec_to | INT8 | TEXT,INT8 | Extract field from time | GENERAL + utc_usec_to | INT8 | TEXT,INT8,INT4 | Extract field from time | GENERAL + + (181) rows + + For Reference, many details have been omitted in order to present a clear picture of the process. + +``\df [function name]`` command also shows a function description as follows: :: + + default> \df round; + Name | Result type | Argument types | Description | Type + -----------------+-----------------+-----------------------+-----------------------------------------------+----------- + round | INT8 | FLOAT4 | Round to nearest integer. | GENERAL + round | INT8 | FLOAT8 | Round to nearest integer. | GENERAL + round | INT8 | INT4 | Round to nearest integer. | GENERAL + round | INT8 | INT8 | Round to nearest integer. | GENERAL + round | FLOAT8 | FLOAT8,INT4 | Round to s decimalN places. | GENERAL + round | FLOAT8 | INT8,INT4 | Round to s decimalN places. | GENERAL + + (6) rows + + Function: INT8 round(float4) + Description: Round to nearest integer. + Example: + > SELECT round(42.4) + 42 + + Function: FLOAT8 round(float8,int4) + Description: Round to s decimalN places. + Example: + > SELECT round(42.4382, 2) + 42.44 Added: tajo/site/docs/devel/_sources/tsql/single_command.txt URL: http://svn.apache.org/viewvc/tajo/site/docs/devel/_sources/tsql/single_command.txt?rev=1644656&view=auto ============================================================================== --- tajo/site/docs/devel/_sources/tsql/single_command.txt (added) +++ tajo/site/docs/devel/_sources/tsql/single_command.txt Thu Dec 11 14:41:20 2014 @@ -0,0 +1,24 @@ +********************************* +Executing a single command +********************************* + + +You may want to run more queries without entering tsql prompt. Tsql provides the ``-c`` argument for above requirement. And Tajo assumes that queries are separated by semicolon as follows: + +.. code-block:: sql + + $ bin/tsql -c "select count(*) from table1; select sum(score) from table1;" + Progress: 0%, response time: 0.217 sec + Progress: 0%, response time: 0.218 sec + Progress: 100%, response time: 0.317 sec + ?count + ------------------------------- + 5 + (1 rows, 0.317 sec, 2 B selected) + Progress: 0%, response time: 0.202 sec + Progress: 0%, response time: 0.204 sec + Progress: 100%, response time: 0.345 sec + ?sum + ------------------------------- + 15.0 + (1 rows, 0.345 sec, 5 B selected)