tajo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jihoon...@apache.org
Subject svn commit: r1728394 [3/42] - in /tajo/site/docs: 0.11.1/ 0.11.1/_sources/ 0.11.1/_sources/backup_and_restore/ 0.11.1/_sources/configuration/ 0.11.1/_sources/functions/ 0.11.1/_sources/index/ 0.11.1/_sources/partitioning/ 0.11.1/_sources/sql_language/ ...
Date Thu, 04 Feb 2016 00:29:12 GMT
Added: tajo/site/docs/0.11.1/_sources/sql_language/alter_table.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/sql_language/alter_table.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/sql_language/alter_table.txt (added)
+++ tajo/site/docs/0.11.1/_sources/sql_language/alter_table.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,100 @@
+************************
+ALTER TABLE
+************************
+
+========================
+RENAME TABLE
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  ALTER TABLE <table_name> RENAME TO <new_table_name>
+
+  For example:
+  ALTER TABLE table1 RENAME TO table2;
+
+This statement lets you change the name of a table to a different name.
+
+========================
+RENAME COLUMN
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  ALTER TABLE <table_name> RENAME COLUMN <column_name> TO <new_column_name>
+
+  For example:
+  ALTER TABLE table1 RENAME COLUMN id TO id2;
+
+This statement will allow users to change a column's name.
+
+========================
+ADD COLUMN
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  ALTER TABLE <table_name> ADD COLUMN <column_name> <data_type>
+
+  For example:
+  ALTER TABLE table1 ADD COLUMN id text;
+
+This statement lets you add new columns to the end of the existing column.
+
+========================
+SET PROPERTY
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  ALTER TABLE <table_name> SET PROPERTY (<key> = <value>, ...)
+
+  For example:
+  ALTER TABLE table1 SET PROPERTY 'timezone' = 'GMT-7'
+  ALTER TABLE table1 SET PROPERTY 'text.delimiter' = '&'
+  ALTER TABLE table1 SET PROPERTY 'compression.type'='RECORD','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec'
+
+
+This statement will allow users to change a table property.
+
+========================
+ DROP PARTITION
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  ALTER TABLE <table_name> [IF EXISTS] DROP PARTITION (<partition column> = <partition value>, ...) [PURGE]
+
+  For example:
+  ALTER TABLE table1 DROP PARTITION (col1 = 1 , col2 = 2)
+  ALTER TABLE table1 DROP PARTITION (col1 = '2015' , col2 = '01', col3 = '11' )
+  ALTER TABLE table1 DROP PARTITION (col1 = 'TAJO' ) PURGE
+
+You can use ``ALTER TABLE DROP PARTITION`` to drop a partition for a table. This doesn't remove the data for a table. But if ``PURGE`` is specified, the partition data will be removed. The metadata is completely lost in all cases. An error is thrown if the partition for the table doesn't exist. You can use ``IF EXISTS`` to skip the error.
+
+========================
+REPAIR PARTITION
+========================
+
+Tajo stores a list of partitions for each table in its catalog. If partitions are manually added to the distributed file system, the metastore is not aware of these partitions. Running the ``ALTER TABLE REPAIR PARTITION`` statement ensures that the tables are properly populated.
+
+*Synopsis*
+
+.. code-block:: sql
+
+  ALTER TABLE <table_name> REPAIR PARTITION
+
+.. note::
+
+  Even though an information of a partition is stored in the catalog, Tajo does not recover it when its partition directory doesn't exist in the file system.
+

Added: tajo/site/docs/0.11.1/_sources/sql_language/data_model.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/sql_language/data_model.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/sql_language/data_model.txt (added)
+++ tajo/site/docs/0.11.1/_sources/sql_language/data_model.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,68 @@
+**********
+Data Model
+**********
+
+===============
+Data Types
+===============
+
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+| Support   | SQL Type Name  |  Alias                     | Size (byte) | Description                                       | Range                                                                    |
++===========+================+============================+=============+===================================================+==========================================================================+ 
+| O         | boolean        |  bool                      |  1          |                                                   | true/false                                                               |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+  
+|           | bit            |                            |  1          |                                                   | 1/0                                                                      | 
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+|           | varbit         |  bit varying               |             |                                                   |                                                                          |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+| O         | tinyint        |  int1                      |  1          | tiny-range integer value                          | -2^7 (-128) to 2^7-1 (127)                                               |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+
+| O         | smallint       |  int2                      |  2          | small-range integer value                         | -2^15 (-32,768) to 2^15-1 (32,767)                                       |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+| O         | integer        |  int, int4                 |  4          | integer value                                     | -2^31 (-2,147,483,648) to 2^31 - 1 (2,147,483,647)                       |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+| O         | bigint         |  bit varying               |  8          | larger range integer value                        | -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807) |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+| O         | real           |  int8                      |  4          | variable-precision, inexact, real number value    | -3.4028235E+38 to 3.4028235E+38 (6 decimal digits precision)             |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+| O         | float[(n)]     |  float4                    |  4 or 8     | variable-precision, inexact, real number value    |                                                                          |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+| O         | double         |  float8, double precision  |  8          | variable-precision, inexact, real number value    | 1 .7E–308 to 1.7E+308 (15 decimal digits precision)                      |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+|           | number         |  decimal                   |             |                                                   |                                                                          |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+|           | char[(n)]      |  character                 |             |                                                   |                                                                          |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+|           | varchar[(n)]   |  character varying         |             |                                                   |                                                                          |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+| O         | text           |  text                      |             | variable-length unicode text                      |                                                                          |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+|           | binary         |  binary                    |             |                                                   |                                                                          |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+|           | varbinary[(n)] |  binary varying            |             |                                                   |                                                                          |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+| O         | blob           |  bytea                     |             | variable-length binary string                     |                                                                          |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+| O         | date           |                            |             |                                                   |                                                                          | 
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+| O         | time           |                            |             |                                                   |                                                                          | 
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+|           | timetz         |  time with time zone       |             |                                                   |                                                                          |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+| O         | timestamp      |                            |             |                                                   |                                                                          |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+|           | timestamptz    |                            |             |                                                   |                                                                          |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+ 
+| O         | inet4          |                            | 4           | IPv4 address                                      |                                                                          |
++-----------+----------------+----------------------------+-------------+---------------------------------------------------+--------------------------------------------------------------------------+
+
+-----------------------------------------
+Using real number value (real and double)
+-----------------------------------------
+
+The real and double data types are mapped to float and double of java primitives respectively. Java primitives float and double follows the IEEE 754 specification. So, these types are correctly matched to SQL standard data types.
+
++ float[( n )] is mapped to either float or double according to a given length n. If n is specified, it must be bewtween 1 and 53. The default value of n is 53.
++ If 1 <- n <- 24, a value is mapped to float (6 decimal digits precision).
++ If 25 <- n <- 53, a value is mapped to double (15 decimal digits precision). 
++ Do not use approximate real number columns in WHERE clause in order to compare some exact matches, especially the - and <> operators. The > or < comparisons work well. 
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/sql_language/ddl.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/sql_language/ddl.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/sql_language/ddl.txt (added)
+++ tajo/site/docs/0.11.1/_sources/sql_language/ddl.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,127 @@
+************************
+Data Definition Language
+************************
+
+========================
+CREATE DATABASE
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  CREATE DATABASE [IF NOT EXISTS] <database_name>
+
+*Description*
+
+Database is the namespace in Tajo. A database can contain multiple tables which have unique name in it.
+``IF NOT EXISTS`` allows ``CREATE DATABASE`` statement to avoid an error which occurs when the database exists.
+
+========================
+DROP DATABASE
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  DROP DATABASE [IF EXISTS] <database_name>
+
+``IF EXISTS`` allows ``DROP DATABASE`` statement to avoid an error which occurs when the database does not exist.
+
+========================
+CREATE TABLE
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  CREATE TABLE [IF NOT EXISTS] <table_name> [(column_list)] [TABLESPACE tablespace_name]
+  [using <storage_type> [with (<key> = <value>, ...)]] [AS <select_statement>]
+
+  CREATE EXTERNAL TABLE [IF NOT EXISTS] <table_name> (column_list)
+  using <storage_type> [with (<key> = <value>, ...)] LOCATION '<path>'
+
+*Description*
+
+In Tajo, there are two types of tables, `managed table` and `external table`.
+Managed tables are placed on some predefined tablespaces. The ``TABLESPACE`` clause is to specify a tablespace for this table. For external tables, Tajo allows an arbitrary table location with the ``LOCATION`` clause.
+For more information about tables and tablespace, please refer to :doc:`/table_management/table_overview` and :doc:`/table_management/tablespaces`.
+
+``column_list`` is a sequence of the column name and its type like ``<column_name> <data_type>, ...``. Additionally, the `asterisk (*)` is allowed for external tables when their data format is `JSON`. You can find more details at :doc:`/table_management/json`.
+
+``IF NOT EXISTS`` allows ``CREATE [EXTERNAL] TABLE`` statement to avoid an error which occurs when the table does not exist.
+
+------------------------
+ Compression
+------------------------
+
+If you want to add an external table that contains compressed data, you should give 'compression.code' parameter to CREATE TABLE statement.
+
+.. code-block:: sql
+
+  create EXTERNAL table lineitem (
+  L_ORDERKEY bigint, 
+  L_PARTKEY bigint, 
+  ...
+  L_COMMENT text) 
+
+  USING TEXT WITH ('text.delimiter'='|','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec')
+  LOCATION 'hdfs://localhost:9010/tajo/warehouse/lineitem_100_snappy';
+
+`compression.codec` parameter can have one of the following compression codecs:
+  * org.apache.hadoop.io.compress.BZip2Codec
+  * org.apache.hadoop.io.compress.DeflateCodec
+  * org.apache.hadoop.io.compress.GzipCodec
+  * org.apache.hadoop.io.compress.SnappyCodec 
+
+========================
+ DROP TABLE
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  DROP TABLE [IF EXISTS] <table_name> [PURGE]
+
+*Description*
+
+``IF EXISTS`` allows ``DROP DATABASE`` statement to avoid an error which occurs when the database does not exist. ``DROP TABLE`` statement removes a table from Tajo catalog, but it does not remove the contents. If ``PURGE`` option is given, ``DROP TABLE`` statement will eliminate the entry in the catalog as well as the contents.
+
+========================
+ CREATE INDEX
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  CREATE INDEX [ name ] ON table_name [ USING method ]
+  ( { column_name | ( expression ) } [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] )
+  [ WHERE predicate ]
+
+*Description*
+
+Tajo supports index for fast data retrieval. Currently, index is supported for only plain ``TEXT`` formats stored on ``HDFS``.
+For more information, please refer to :doc:`/index_overview`.
+
+------------------------
+ Index method
+------------------------
+
+Currently, Tajo supports only one type of index.
+
+Index methods:
+  * TWO_LEVEL_BIN_TREE: This method is used by default in Tajo. For more information about its structure, please refer to :doc:`/index/types`.
+
+========================
+ DROP INDEX
+========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  DROP INDEX name

Added: tajo/site/docs/0.11.1/_sources/sql_language/explain.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/sql_language/explain.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/sql_language/explain.txt (added)
+++ tajo/site/docs/0.11.1/_sources/sql_language/explain.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,93 @@
+************************
+EXPLAIN
+************************
+
+*Synopsis*
+
+.. code-block:: sql
+
+  EXPLAIN [GLOBAL] statement
+
+
+*Description*
+
+Show the logical or global execution plan of a statement.
+
+
+*Examples*
+
+Logical plan:
+
+.. code-block:: sql
+
+  default> EXPLAIN SELECT l_orderkey, count(*) FROM lineitem GROUP BY l_orderkey;
+  explain
+  -------------------------------
+  GROUP_BY(1)(l_orderkey)
+    => exprs: (count())
+    => target list: default.lineitem.l_orderkey (INT8), ?count (INT8)
+    => out schema:{(2) default.lineitem.l_orderkey (INT8), ?count (INT8)}
+    => in schema:{(1) default.lineitem.l_orderkey (INT8)}
+     SCAN(0) on default.lineitem
+       => target list: default.lineitem.l_orderkey (INT8)
+       => out schema: {(1) default.lineitem.l_orderkey (INT8)}
+       => in schema: {(16) default.lineitem.l_orderkey (INT8), default.lineitem.l_partkey (INT8), default.lineitem.l_suppkey (INT8), default.lineitem.l_linenumber (INT8), default.lineitem.l_quantity (FLOAT8), default.lineitem.l_extendedprice (FLOAT8), default.lineitem.l_discount (FLOAT8), default.lineitem.l_tax (FLOAT8), default.lineitem.l_returnflag (TEXT), default.lineitem.l_linestatus (TEXT), default.lineitem.l_shipdate (DATE), default.lineitem.l_commitdate (DATE), default.lineitem.l_receiptdate (DATE), default.lineitem.l_shipinstruct (TEXT), default.lineitem.l_shipmode (TEXT), default.lineitem.l_comment (TEXT)}
+
+
+Global plan:
+
+.. code-block:: sql
+
+  default> EXPLAIN GLOBAL SELECT l_orderkey, count(*) FROM lineitem GROUP BY l_orderkey;
+  explain
+  -------------------------------
+  -------------------------------------------------------------------------------
+  Execution Block Graph (TERMINAL - eb_0000000000000_0000_000003)
+  -------------------------------------------------------------------------------
+  |-eb_0000000000000_0000_000003
+     |-eb_0000000000000_0000_000002
+        |-eb_0000000000000_0000_000001
+  -------------------------------------------------------------------------------
+  Order of Execution
+  -------------------------------------------------------------------------------
+  1: eb_0000000000000_0000_000001
+  2: eb_0000000000000_0000_000002
+  3: eb_0000000000000_0000_000003
+  -------------------------------------------------------------------------------
+
+  =======================================================
+  Block Id: eb_0000000000000_0000_000001 [LEAF]
+  =======================================================
+
+  [Outgoing]
+  [q_0000000000000_0000] 1 => 2 (type=HASH_SHUFFLE, key=default.lineitem.l_orderkey (INT8), num=32)
+
+  GROUP_BY(5)(l_orderkey)
+    => exprs: (count())
+    => target list: default.lineitem.l_orderkey (INT8), ?count_1 (INT8)
+    => out schema:{(2) default.lineitem.l_orderkey (INT8), ?count_1 (INT8)}
+    => in schema:{(1) default.lineitem.l_orderkey (INT8)}
+     SCAN(0) on default.lineitem
+       => target list: default.lineitem.l_orderkey (INT8)
+       => out schema: {(1) default.lineitem.l_orderkey (INT8)}
+       => in schema: {(16) default.lineitem.l_orderkey (INT8), default.lineitem.l_partkey (INT8), default.lineitem.l_suppkey (INT8), default.lineitem.l_linenumber (INT8), default.lineitem.l_quantity (FLOAT8), default.lineitem.l_extendedprice (FLOAT8), default.lineitem.l_discount (FLOAT8), default.lineitem.l_tax (FLOAT8), default.lineitem.l_returnflag (TEXT), default.lineitem.l_linestatus (TEXT), default.lineitem.l_shipdate (DATE), default.lineitem.l_commitdate (DATE), default.lineitem.l_receiptdate (DATE), default.lineitem.l_shipinstruct (TEXT), default.lineitem.l_shipmode (TEXT), default.lineitem.l_comment (TEXT)}
+
+  =======================================================
+  Block Id: eb_0000000000000_0000_000002 [ROOT]
+  =======================================================
+
+  [Incoming]
+  [q_0000000000000_0000] 1 => 2 (type=HASH_SHUFFLE, key=default.lineitem.l_orderkey (INT8), num=32)
+
+  GROUP_BY(1)(l_orderkey)
+    => exprs: (count(?count_1 (INT8)))
+    => target list: default.lineitem.l_orderkey (INT8), ?count (INT8)
+    => out schema:{(2) default.lineitem.l_orderkey (INT8), ?count (INT8)}
+    => in schema:{(2) default.lineitem.l_orderkey (INT8), ?count_1 (INT8)}
+     SCAN(6) on eb_0000000000000_0000_000001
+       => out schema: {(2) default.lineitem.l_orderkey (INT8), ?count_1 (INT8)}
+       => in schema: {(2) default.lineitem.l_orderkey (INT8), ?count_1 (INT8)}
+
+  =======================================================
+  Block Id: eb_0000000000000_0000_000003 [TERMINAL]
+  =======================================================
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/sql_language/insert.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/sql_language/insert.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/sql_language/insert.txt (added)
+++ tajo/site/docs/0.11.1/_sources/sql_language/insert.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,26 @@
+*************************
+INSERT (OVERWRITE) INTO
+*************************
+
+INSERT OVERWRITE statement overwrites a table data of an existing table or a data in a given directory. Tajo's INSERT OVERWRITE statement follows ``INSERT INTO SELECT`` statement of SQL. The examples are as follows:
+
+.. code-block:: sql
+
+  create table t1 (col1 int8, col2 int4, col3 float8);
+
+  -- when a target table schema and output schema are equivalent to each other
+  INSERT OVERWRITE INTO t1 SELECT l_orderkey, l_partkey, l_quantity FROM lineitem;
+  -- or
+  INSERT OVERWRITE INTO t1 SELECT * FROM lineitem;
+
+  -- when the output schema are smaller than the target table schema
+  INSERT OVERWRITE INTO t1 SELECT l_orderkey FROM lineitem;
+
+  -- when you want to specify certain target columns
+  INSERT OVERWRITE INTO t1 (col1, col3) SELECT l_orderkey, l_quantity FROM lineitem;
+
+In addition, INSERT OVERWRITE statement overwrites table data as well as a specific directory.
+
+.. code-block:: sql
+
+  INSERT OVERWRITE INTO LOCATION '/dir/subdir' SELECT l_orderkey, l_quantity FROM lineitem;
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/sql_language/joins.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/sql_language/joins.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/sql_language/joins.txt (added)
+++ tajo/site/docs/0.11.1/_sources/sql_language/joins.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,160 @@
+**************************
+Joins
+**************************
+
+=====================
+Overview
+=====================
+
+In Tajo, a single query can accesses multiple rows of the same or different relations at one time. This query is called *join*.
+Currently, Tajo supports cross, inner, and outer joins. 
+
+A join query can involve multiple relations in the ``FROM`` clause according to the following rule.
+
+.. code-block:: sql
+
+  FROM joined_table [, joined_table [, ...] ]
+
+, where ``joined_table`` is:
+
+.. code-block:: sql
+
+  table_reference join_type table_reference [ ON join_condition ]
+
+``join_type`` can be one of the followings.
+
+.. code-block:: sql
+
+  CROSS JOIN
+  [ NATURAL ] [ INNER ] JOIN
+  { LEFT | RIGHT | FULL } OUTER JOIN
+
+``join_condition`` can be specified in the ``WHERE`` clause as well as the ``ON`` clause.
+
+For more information, please refer to :doc:`/sql_language/predicates`.
+
+.. note::
+
+  Currently, Tajo cannot natively support non-equality conditions. It means that inner joins with non-equality conditions will be executed with cross joins. Outer joins with non-equality conditions cannot be executed yet.
+
+=====================
+Examples
+=====================
+
+* For inner and outer joins, only equality conditions are allowed as follows. For inner joins, implicit join notation is allowed.
+
+.. code-block:: sql
+
+  SELECT a.* FROM a, b WHERE a.id = b.id
+
+  SELECT a.* FROM a JOIN b ON (a.id = b.id)
+
+  SELECT a.* FROM a LEFT OUTER JOIN b ON (a.id = b.id AND a.type = b.type)
+
+However, the following query will be executed with CROSS join, thereby taking a very long time.
+
+.. code-block:: sql
+
+  SELECT a.* FROM a JOIN b ON (a.id <> b.id)
+
+In addition, the following query is not allowed.
+
+.. code-block:: sql
+
+  SELECT a.* FROM a LEFT OUTER JOIN b ON (a.id > b.id)
+
+* You can join more than 2 tables in a query with multiple join types.
+
+.. code-block:: sql
+
+  SELECT a.* FROM a, b, c WHERE a.id = b.id AND b.id2 = c.id2
+
+  SELECT a.* FROM a INNER JOIN b ON a.id = b.id FULL OUTER JOIN c ON b.id2 = c.id2
+
+When a query involves three or more tables, there may be a lot of possible join orders. Tajo automatically finds the best join order regardless of the input order. For example, suppose that relation ``b`` is larger than relation ``a``, and in turn, the relation ``c`` is larger than relation ``b``. The query
+
+.. code-block:: sql
+
+  SELECT a.* FROM c INNER JOIN b ON b.id2 = c.id2 INNER JOIN a ON a.id = b.id
+
+is rewritten to
+
+.. code-block:: sql
+
+  SELECT a.* FROM a INNER JOIN b ON a.id = b.id INNER JOIN c ON b.id2 = c.id2
+
+because early join of small relations accelerates the query speed.
+
+* Tajo also supports natural join. When relations have a common column name, they are joined with an equality condition on that column even though it is not explicitly declared in the query. For example,
+
+.. code-block:: sql
+
+  SELECT a.* FROM a JOIN b
+
+is rewritten to
+
+.. code-block:: sql
+
+  SELECT a.* FROM a INNER JOIN b ON a.id = b.id
+
+=====================
+Join Optimization
+=====================
+
+Join is one of the most expensive operations in relational world.
+Tajo adopts several optimization techniques to improve its join performance.
+
+---------------------
+Join ordering
+---------------------
+
+Join ordering is one of the important techniques for join performance improvement.
+Basically, joining multiple relations is left-associative. However, query performance can be significantly changed according to which order is chosen for the join execution.
+
+To find the best join order, Tajo's cost-based optimizer considers join conditions, join types, and the size of input relations.
+In addition, it considers the computation cost of consecutive joins so that the shape of query plan forms a bushy tree.
+
+For example, suppose that there are 4 relations ``a`` (10), ``b`` (20), ``c`` (30), and ``d`` (40) where the numbers within brackets represent the relation size.
+The following query
+
+.. code-block:: sql
+
+  SELECT
+    *
+  FROM
+    a, b, c, d
+  WHERE
+    a.id1 = b.id1 AND
+    a.id4 = d.id4 AND
+    b.id2 = c.id2 AND
+    c.id3 = d.id3
+
+is rewritten into
+
+.. code-block:: sql
+
+  SELECT
+    *
+  FROM
+    (a INNER JOIN d ON a.id4 = d.id4)
+    INNER JOIN
+    (b INNER JOIN c ON b.id2 = c.id2)
+    ON a.id1 = b.id1 AND c.id3 = d.id3
+
+
+---------------------
+Broadcast join
+---------------------
+
+In Tajo, a join query is executed in two stages. The first stage is responsible for scanning input data and performing local join, while the second stage is responsible for performing global join and returning the result.
+To perform join globally in the second stage, intermediate result of the first stage is exchanged according to join keys, i.e., *shuffled*, among Tajo workers.
+Here, the cost of shuffle is expensive especially when the input relation size is very small.
+
+Broadcast join is a good solution to handle this problem. In broadcast join, the small relations are replicated to every worker who participates in the join computation.
+Thus, they can perform join without expensive data shuffle.
+
+Tajo provides a session variable for broadcast join configuration. (For more detailed information of session variables, please refer to :doc:`/tsql/variables`.)
+
+* ``BROADCAST_NON_CROSS_JOIN_THRESHOLD`` and ``BROADCAST_CROSS_JOIN_THRESHOLD`` are thresholds for broadcast join. Only the relations who are larger than this threshold can be broadcasted.
+
+You can also apply this configuration system widely by setting ``tajo.dist-query.broadcast.non-cross-join.threshold-kb`` or ``tajo.dist-query.broadcast.cross-join.threshold-kb`` in ``${TAJO_HOME}/conf/tajo-site.xml``.
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/sql_language/predicates.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/sql_language/predicates.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/sql_language/predicates.txt (added)
+++ tajo/site/docs/0.11.1/_sources/sql_language/predicates.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,178 @@
+***********
+ Predicates
+***********
+
+=============
+ IN Predicate
+=============
+
+IN predicate provides a comparison of row, array, and result of a subquery.
+
+*Synopsis*
+
+.. code-block:: sql
+
+  column_reference (NOT) IN (val1, val2, ..., valN)
+  column_reference (NOT) IN (SELECT ... FROM ...) AS alias_name
+
+
+Examples are as follows:
+
+.. code-block:: sql
+
+  -- this statement filters lists down all the records where col1 value is 1, 2 or 3:
+  SELECT col1, col2 FROM table1 WHERE col1 IN (1, 2, 3);
+
+  -- this statement filters lists down all the records where col1 value is neither 1, 2 nor 3:
+  SELECT col1, col2 FROM table1 WHERE col1 NOT IN (1, 2, 3);
+
+You can use `IN clause` on text data domain as follows:
+
+.. code-block:: sql
+
+  SELECT col1, col2 FROM table1 WHERE col2 IN ('tajo', 'hadoop');
+
+  SELECT col1, col2 FROM table1 WHERE col2 NOT IN ('tajo', 'hadoop');
+
+Finally, you can use subqueries in the `IN clause`.
+
+.. code-block:: sql
+
+  SELECT col1, col2
+  FROM table1
+  WHERE col3 IN (
+    SELECT avg(col2) as avg_col2
+    FROM table2
+    GROUP BY col1
+    HAVING avg_col2 > 100);
+
+  SELECT col1, col2
+  FROM table1
+  WHERE col3 NOT IN (
+    SELECT avg(col2) as avg_col2
+    FROM table2
+    GROUP BY col1
+    HAVING avg_col2 > 100);
+
+==================================
+String Pattern Matching Predicates
+==================================
+
+--------------------
+LIKE
+--------------------
+
+LIKE operator returns true or false depending on whether its pattern matches the given string. An underscore (_) in pattern matches any single character. A percent sign (%) matches any sequence of zero or more characters.
+
+*Synopsis*
+
+.. code-block:: sql
+
+  string LIKE pattern
+  string NOT LIKE pattern
+
+
+--------------------
+ILIKE
+--------------------
+
+ILIKE is the same to LIKE, but it is a case insensitive operator. It is not in the SQL standard. We borrow this operator from PostgreSQL.
+
+*Synopsis*
+
+.. code-block:: sql
+
+  string ILIKE pattern
+  string NOT ILIKE pattern
+
+
+--------------------
+SIMILAR TO
+--------------------
+
+*Synopsis*
+
+.. code-block:: sql
+
+  string SIMILAR TO pattern
+  string NOT SIMILAR TO pattern
+
+It returns true or false depending on whether its pattern matches the given string. Also like LIKE, ``SIMILAR TO`` uses ``_`` and ``%`` as metacharacters denoting any single character and any string, respectively.
+
+In addition to these metacharacters borrowed from LIKE, 'SIMILAR TO' supports more powerful pattern-matching metacharacters borrowed from regular expressions:
+
++------------------------+-------------------------------------------------------------------------------------------+
+| metacharacter          | description                                                                               |
++========================+===========================================================================================+
+| &#124;                 | denotes alternation (either of two alternatives).                                         |
++------------------------+-------------------------------------------------------------------------------------------+
+| *                      | denotes repetition of the previous item zero or more times.                               |
++------------------------+-------------------------------------------------------------------------------------------+
+| +                      | denotes repetition of the previous item one or more times.                                |
++------------------------+-------------------------------------------------------------------------------------------+
+| ?                      | denotes repetition of the previous item zero or one time.                                 |
++------------------------+-------------------------------------------------------------------------------------------+
+| {m}                    | denotes repetition of the previous item exactly m times.                                  |
++------------------------+-------------------------------------------------------------------------------------------+
+| {m,}                   | denotes repetition of the previous item m or more times.                                  |
++------------------------+-------------------------------------------------------------------------------------------+
+| {m,n}                  | denotes repetition of the previous item at least m and not more than n times.             |
++------------------------+-------------------------------------------------------------------------------------------+
+| []                     | A bracket expression specifies a character class, just as in POSIX regular expressions.   |
++------------------------+-------------------------------------------------------------------------------------------+
+| ()                     | Parentheses can be used to group items into a single logical item.                        |
++------------------------+-------------------------------------------------------------------------------------------+
+
+Note that `.`` is not used as a metacharacter in ``SIMILAR TO`` operator.
+
+---------------------
+Regular expressions
+---------------------
+
+Regular expressions provide a very powerful means for string pattern matching. In the current Tajo, regular expressions are based on Java-style regular expressions instead of POSIX regular expression. The main difference between java-style one and POSIX's one is character class.
+
+*Synopsis*
+
+.. code-block:: sql
+
+  string ~ pattern
+  string !~ pattern
+
+  string ~* pattern
+  string !~* pattern
+
++----------+---------------------------------------------------------------------------------------------------+
+| operator | Description                                                                                       |
++==========+===================================================================================================+
+| ~        | It returns true if a given regular expression is matched to string. Otherwise, it returns false.  |
++----------+---------------------------------------------------------------------------------------------------+
+| !~       | It returns false if a given regular expression is matched to string. Otherwise, it returns true.  |
++----------+---------------------------------------------------------------------------------------------------+
+| ~*       | It is the same to '~', but it is case insensitive.                                                |
++----------+---------------------------------------------------------------------------------------------------+
+| !~*      | It is the same to '!~', but it is case insensitive.                                               |
++----------+---------------------------------------------------------------------------------------------------+
+
+Here are examples:
+
+.. code-block:: sql
+
+  'abc'   ~   '.*c'               true
+  'abc'   ~   'c'                 false
+  'aaabc' ~   '([a-z]){3}bc       true
+  'abc'   ~*  '.*C'               true
+  'abc'   !~* 'B.*'               true
+
+Regular expressions operator is not in the SQL standard. We borrow this operator from PostgreSQL.
+
+*Synopsis for REGEXP and RLIKE operators*
+
+.. code-block:: sql
+
+  string REGEXP pattern
+  string NOT REGEXP pattern
+
+  string RLIKE pattern
+  string NOT RLIKE pattern
+
+But, they do not support case-insensitive operators.
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/sql_language/queries.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/sql_language/queries.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/sql_language/queries.txt (added)
+++ tajo/site/docs/0.11.1/_sources/sql_language/queries.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,260 @@
+*******
+Queries
+*******
+
+========
+Overview
+========
+
+*Synopsis*
+
+.. code-block:: sql
+
+  SELECT [distinct [all]] * | <expression> [[AS] <alias>] [, ...]
+    [FROM <table reference> [[AS] <table alias name>] [, ...]]
+    [WHERE <condition>]
+    [GROUP BY <expression> [, ...]]
+    [HAVING <condition>]
+    [ORDER BY <expression> [ASC|DESC] [NULLS (FIRST|LAST)] [, ...]]
+
+
+
+===========
+From Clause
+===========
+
+*Synopsis*
+
+.. code-block:: sql
+
+  [FROM <table reference> [[AS] <table alias name>] [, ...]]
+
+
+The ``FROM`` clause specifies one or more other tables given in a comma-separated table reference list.
+A table reference can be a relation name , or a subquery, a table join, or complex combinations of them.
+
+-----------------------
+Table and Table Aliases
+-----------------------
+
+A temporary name can be given to tables and complex table references to be used
+for references to the derived table in the rest of the query. This is called a table alias.
+
+To create a a table alias, please use ``AS``:
+
+.. code-block:: sql
+
+  FROM table_reference AS alias
+
+or
+
+.. code-block:: sql
+
+  FROM table_reference alias
+
+The ``AS`` keyword can be omitted, and *Alias* can be any identifier.
+
+A typical application of table aliases is to give short names to long table references. For example:
+
+.. code-block:: sql
+
+  SELECT * FROM long_table_name_1234 s JOIN another_long_table_name_5678 a ON s.id = a.num;
+
+-------------
+Joined Tables
+-------------
+
+Tajo supports all kinds of join types.
+
+Join Types
+~~~~~~~~~~
+
+Cross Join
+^^^^^^^^^^
+
+.. code-block:: sql
+
+  FROM T1 CROSS JOIN T2
+
+Cross join, also called *Cartesian product*, results in every possible combination of rows from T1 and T2.
+
+``FROM T1 CROSS JOIN T2`` is equivalent to ``FROM T1, T2``.
+
+Qualified joins
+^^^^^^^^^^^^^^^
+
+Qualified joins implicitly or explicitly have join conditions. Inner/Outer/Natural Joins all are qualified joins.
+Except for natural join, ``ON`` or ``USING`` clause in each join is used to specify a join condition. 
+A join condition must include at least one boolean expression, and it can also include just filter conditions.
+
+**Inner Join**
+
+.. code-block:: sql
+
+  T1 [INNER] JOIN T2 ON boolean_expression
+  T1 [INNER] JOIN T2 USING (join column list)
+
+``INNER`` keyword is the default, and so ``INNER`` can be omitted when you use inner join.
+
+**Outer Join**
+
+.. code-block:: sql
+
+  T1 (LEFT|RIGHT|FULL) OUTER JOIN T2 ON boolean_expression
+  T1 (LEFT|RIGHT|FULL) OUTER JOIN T2 USING (join column list)
+
+One of ``LEFT``, ``RIGHT``, or ``FULL`` must be specified for outer joins. 
+Join conditions in outer join will have different behavior according to corresponding table references of join conditions.
+To know outer join behavior in more detail, please refer to 
+`Advanced outer join constructs <http://www.ibm.com/developerworks/data/library/techarticle/purcell/0201purcell.html>`_.
+
+**Natural Join**
+
+.. code-block:: sql
+
+  T1 NATURAL JOIN T2
+
+``NATURAL`` is a short form of ``USING``. It forms a ``USING`` list consisting of all common column names that appear in 
+both join tables. These common columns appear only once in the output table. If there are no common columns, 
+``NATURAL`` behaves like ``CROSS JOIN``.
+
+**Subqueries**
+
+A subquery is a query that is nested inside another query. It can be embedded in the FROM and WHERE clauses.
+
+Example:
+
+.. code-block:: sql
+
+  FROM (SELECT col1, sum(col2) FROM table1 WHERE col3 > 0 group by col1 order by col1) AS alias_name
+  WHERE col1 IN (SELECT col1 FROM table1 WHERE col2 > 0 AND col2 < 100) AS alias_name
+
+For more detailed information, please refer to :doc:`joins`.
+
+============
+Where Clause
+============
+
+The syntax of the WHERE Clause is
+
+*Synopsis*
+
+.. code-block:: sql
+
+  WHERE search_condition
+
+``search_condition`` can be any boolean expression. 
+In order to know additional predicates, please refer to :doc:`/sql_language/predicates`.
+
+==========================
+Groupby and Having Clauses
+==========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  SELECT select_list
+      FROM ...
+      [WHERE ...]
+      GROUP BY grouping_column_reference [, grouping_column_reference]...
+      [HAVING boolean_expression]
+
+The rows which passes ``WHERE`` filter may be subject to grouping, specified by ``GROUP BY`` clause.
+Grouping combines a set of rows having common values into one group, and then computes rows in the group with aggregation functions. ``HAVING`` clause can be used with only ``GROUP BY`` clause. It eliminates the unqualified result rows of grouping.
+
+``grouping_column_reference`` can be a column reference, a complex expression including scalar functions and arithmetic operations.
+
+.. code-block:: sql
+
+  SELECT l_orderkey, SUM(l_quantity) AS quantity FROM lineitem GROUP BY l_orderkey;
+
+  SELECT substr(l_shipdate,1,4) as year, SUM(l_orderkey) AS total2 FROM lineitem GROUP BY substr(l_shipdate,1,4);
+
+If a SQL statement includes ``GROUP BY`` clause, expressions in select list must be either grouping_column_reference or aggregation function. For example, the following example query is not allowed because ``l_orderkey`` does not occur in ``GROUP BY`` clause.
+
+.. code-block:: sql
+
+  SELECT l_orderkey, l_partkey, SUM(l_orderkey) AS total FROM lineitem GROUP BY l_partkey;
+
+Aggregation functions can be used with ``DISTINCT`` keywords. It forces an individual aggregate function to take only distinct values of the argument expression. ``DISTINCT`` keyword is used as follows:
+
+.. code-block:: sql
+
+  SELECT l_partkey, COUNT(distinct l_quantity), SUM(distinct l_extendedprice) AS total FROM lineitem GROUP BY l_partkey;
+
+=========================
+Orderby and Limit Clauses
+=========================
+
+*Synopsis*
+
+.. code-block:: sql
+
+  FROM ... ORDER BY <sort_expr> [(ASC|DESC)] [NULLS (FIRST|LAST) [,...]
+
+``sort_expr`` can be a column reference, aliased column reference, or a complex expression. 
+``ASC`` indicates an ascending order of ``sort_expr`` values. ``DESC`` indicates a descending order of ``sort_expr`` values.
+``ASC`` is the default order.
+
+``NULLS FIRST`` and ``NULLS LAST`` options can be used to determine whether nulls values appear 
+before or after non-null values in the sort ordering. By default, null values are dealt as if larger than any non-null value; 
+that is, ``NULLS FIRST`` is the default for ``DESC`` order, and ``NULLS LAST`` otherwise.
+
+================
+Window Functions
+================
+
+A window function performs a calculation across multiple table rows that belong to some window frame.
+
+*Synopsis*
+
+.. code-block:: sql
+
+  SELECT ...., func(param) OVER ([PARTITION BY partition-expr [, ...]] [ORDER BY sort-expr [, ...]]), ....,  FROM
+
+The PARTITION BY list within OVER specifies dividing the rows into groups, or partitions, that share the same values of 
+the PARTITION BY expression(s). For each row, the window function is computed across the rows that fall into 
+the same partition as the current row.
+
+We will briefly explain some examples using window functions.
+
+---------
+Examples
+---------
+
+Multiple window functions can be used in a SQL statement as follows:
+
+.. code-block:: sql
+
+  SELECT l_orderkey, sum(l_discount) OVER (PARTITION BY l_orderkey), sum(l_quantity) OVER (PARTITION BY l_orderkey) FROM LINEITEM;
+
+If ``OVER()`` clause is empty as following, it makes all table rows into one window frame.
+
+.. code-block:: sql
+
+  SELECT salary, sum(salary) OVER () FROM empsalary;
+
+Also, ``ORDER BY`` clause can be used without ``PARTITION BY`` clause as follows:
+
+.. code-block:: sql
+
+  SELECT salary, sum(salary) OVER (ORDER BY salary) FROM empsalary;
+
+Also, all expressions and aggregation functions are allowed in ``ORDER BY`` clause as follows:
+
+.. code-block:: sql
+
+  select
+    l_orderkey,
+    count(*) as cnt,
+    row_number() over (partition by l_orderkey order by count(*) desc)
+    row_num
+  from
+    lineitem
+  group by
+    l_orderkey
+
+.. note::
+
+  Currently, Tajo does not support multiple different partition-expressions in one SQL statement.
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/sql_language/sql_expression.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/sql_language/sql_expression.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/sql_language/sql_expression.txt (added)
+++ tajo/site/docs/0.11.1/_sources/sql_language/sql_expression.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,75 @@
+============================
+ SQL Expressions
+============================
+
+-------------------------
+ Arithmetic Expressions
+-------------------------
+
+-------------------------
+Type Casts
+-------------------------
+A type cast converts a specified-typed data to another-typed data. Tajo has two type cast syntax:
+
+.. code-block:: sql
+
+  CAST ( expression AS type )
+  expression::type
+
+In addition, several functions are provided for type conversion. Please refer to :doc:`../functions/data_type_func_and_operators` and :doc:`../functions/datetime_func_and_operators`.
+
+-------------------------
+String Literals
+-------------------------
+
+A string constant is an arbitrary sequence of characters bounded by single quotes (``'``):
+
+.. code-block:: sql
+
+  'tajo'
+
+-------------------------
+Function Calls
+-------------------------
+
+The syntax for a function call consists of the name of a function and its argument list enclosed in parentheses:
+
+.. code-block:: sql
+
+  function_name ([expression [, expression ... ]] )
+
+For more information about functions, please refer to :doc:`../functions`.
+
+-------------------------
+Window Function Calls
+-------------------------
+
+A window function call performs aggregate operation across the ``window`` which is a set of rows that are related to the current row. An window function has the following syntax. 
+
+.. code-block:: sql
+
+  function_name OVER ( window_definition )
+
+where *function_name* is the name of aggregation function. Any aggregation function or window function can be used. For built-in aggregation functions and window functions, please refer to :doc:`../functions/agg_func` and :doc:`../functions/window_func`.
+
+*window_definition* has the following syntax.
+
+.. code-block:: sql
+
+  [ PARTITION BY expression [, ...] ]
+  [ ORDER BY expression [ ASC | DESC ] [ NULLS { FIRST | LAST } ] [, ...] ]
+
+In the above syntax, *expression* can be any expression except window function call itself.
+*PARTITION BY* and *ORDER BY* lists have the same syntax and semantics as *GROUP BY* and *ORDER BY* clauses.
+That is, *PARTITION BY* list describes how the output result will be partitioned like *GROUP BY* clause creates multiple partitions according to the value of its expression.
+With *ORDER BY* list, result values are sorted in each partition.
+
+Here are some examples.
+
+.. code-block:: sql
+
+  select l_orderkey, count(*) as cnt, row_number() over (order by count(*) desc) row_num from lineitem group by l_orderkey order by l_orderkey;
+
+  select o_custkey, o_orderstatus, sum(o_totalprice) over (partition by o_custkey) as price from orders;
+
+  select l_linenumber, l_tax, sum(l_quantity) over (partition by l_linenumber order by l_tax desc) as quantity, avg(l_extendedprice) over (partition by l_shipdate) from lineitem order by l_tax;
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/storage_plugins.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/storage_plugins.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/storage_plugins.txt (added)
+++ tajo/site/docs/0.11.1/_sources/storage_plugins.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,11 @@
+*************************************
+Storage Plugin
+*************************************
+
+This section describes the storage plugins available in Tajo to access datasets from different data sources.
+
+.. toctree::
+    :maxdepth: 1
+
+    storage_plugins/overview
+    storage_plugins/postgresql
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/storage_plugins/overview.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/storage_plugins/overview.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/storage_plugins/overview.txt (added)
+++ tajo/site/docs/0.11.1/_sources/storage_plugins/overview.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,47 @@
+*************************************
+Storage Plugin Overview
+*************************************
+
+Overview
+========
+
+Tajo supports various storage systems, such as HDFS, Amazon S3, Openstack Swift, HBase, and RDBMS. Tajo already embeds HDFS, S3, Openstack, HBase, RDBMS storage plugins, and also Tajo allows users to register custom storages and data formats to Tajo cluster instances. This section describes how you register custom storages and data types.
+
+Register custom storage
+=======================
+
+First of all, your storage implementation should be packed as a jar file. Then, please copy the jar file into ``tajo/extlib`` directory. Next, you should copy ``conf/storage-site.json.template`` into ``conf/storage-site.json`` and modify the file like the below.
+
+Configuration
+=============
+
+Tajo has a default configuration for builtin storages, such as HDFS, local file system, and Amazon S3. it also allows users to add custom storage plugins
+
+``conf/storage-site.json`` file has the following struct:
+
+.. code-block:: json
+
+  {
+    "storages": {
+      "${scheme}": {
+        "handler": "${class name}"
+      }
+    }
+  }
+
+Each storage instance (i.e., :doc:`/table_management/tablespaces`) is identified by an URI. The scheme of URI plays a role to identify storage type. For example, ``hdfs://`` is used for Hdfs storage, ``jdbc://`` is used for JDBC-based storage, and ``hbase://`` is used for HBase storage. 
+
+You should substitute a scheme name without ``://`` for ``${scheme}``.
+
+See an example for HBase storage.
+
+.. code-block:: json
+
+  {
+    "storages": {
+      "hbase": {
+        "handler": "org.apache.tajo.storage.hbase.HBaseTablespace",
+        "default-format": "hbase"
+      }
+    }
+  }
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/storage_plugins/postgresql.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/storage_plugins/postgresql.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/storage_plugins/postgresql.txt (added)
+++ tajo/site/docs/0.11.1/_sources/storage_plugins/postgresql.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,40 @@
+*************************************
+PostgreSQL Storage Handler
+*************************************
+
+Overview
+========
+
+PostgreSQL storage handler is available by default in Tajo. It enables users' queries to access database objects in PostgreSQL. Tables in PostgreSQL will be shown as tables in Tajo too. Most of the SQL queries used for PostgreSQL are available in Tajo via this storage handles. Its main advantages is to allow federated query processing among tables in stored HDFS and PostgreSQL.
+
+Configuration
+=============
+
+PostgreSQL storage handler is a builtin storage handler. So, you can eaisly register PostgreSQL databases to a Tajo cluster if you just add the following line to ``conf/storage-site.json`` file. If you want to know more information about ``storage-site.json``, please refer to :doc:`/table_management/tablespaces`.
+
+.. code-block:: json
+
+  {
+    "spaces": {
+      "pgsql_db1": {
+        "uri": "jdbc:postgresql://hostname:port/db1"
+        
+        "configs": {
+          "mapped_database": "tajo_db1"
+          "connection_properties": {
+            "user":     "tajo",
+            "password": "xxxx"
+          }
+        }
+      }
+    }
+  }
+
+``configs`` allows users to specific additional configurations.
+``mapped_database`` specifies a database name shown in Tajo. In the example, the database ``db1`` in PostgreSQL
+will be mapped to the database ``tajo_db1`` in Tajo.
+``connection_properties`` allows users to set JDBC connection parameters.
+Please refer to https://jdbc.postgresql.org/documentation/head/connect.html in order to know the details of
+PostgreSQL connection parameters.
+
+The storage-site.json will be effective after you restart a tajo cluster.
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/swift_integration.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/swift_integration.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/swift_integration.txt (added)
+++ tajo/site/docs/0.11.1/_sources/swift_integration.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,110 @@
+*************************************
+OpenStack Swift Integration
+*************************************
+
+Tajo supports OpenStack Swift as one of the underlying storage types.
+In Tajo, Swift objects are represented and recognized by the same URI format as in Hadoop.
+
+You don't need to run Hadoop to run Tajo on Swift, but need to configure it.
+You will also need to configure Swift and Tajo.
+
+For details, please see the following sections.
+
+======================
+Swift configuration
+======================
+
+This step is not mandatory, but is strongly recommended to configure the Swift's proxy-server with ``list_endpoints`` for better performance.
+More information is available `here <http://docs.openstack.org/developer/swift/middleware.html#module-swift.common.middleware.list_endpoints>`_.
+
+======================
+Hadoop configurations
+======================
+
+You need to configure Hadoop to specify how to access Swift objects.
+Here is an example of ``${HADOOP_HOME}/etc/hadoop/core-site.xml``.
+
+-----------------------
+Common configurations
+-----------------------
+
+.. code-block:: xml
+
+  <property>
+    <name>fs.swift.impl</name>
+    <value>org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem</value>
+    <description>File system implementation for Swift</description>
+  </property>
+  <property>
+    <name>fs.swift.blocksize</name>
+    <value>131072</value>
+    <description>Split size in KB</description>
+  </property>
+
+----------------------------
+Configurations per provider
+----------------------------
+
+.. code-block:: xml
+
+  <property>
+    <name>fs.swift.service.${PROVIDER}.auth.url</name>
+    <value>http://127.0.0.1/v2.0/tokens</value>
+    <description>Keystone authenticaiton URL</description>
+  </property>
+  <property>
+    <name>fs.swift.service.${PROVIDER}.auth.endpoint.prefix</name>
+    <value>/endpoints/AUTH_</value>
+    <description>Keystone endpoints prefix</description>
+  </property>
+  <property>
+    <name>fs.swift.service.${PROVIDER}.http.port</name>
+    <value>8080</value>
+    <description>HTTP port</description>
+  </property>
+  <property>
+    <name>fs.swift.service.${PROVIDER}.region</name>
+    <value>regionOne</value>
+    <description>Region name</description>
+  </property>
+  <property>
+    <name>fs.swift.service.${PROVIDER}.tenant</name>
+    <value>demo</value>
+    <description>Tenant name</description>
+  </property>
+  <property>
+    <name>fs.swift.service.${PROVIDER}.username</name>
+    <value>tajo</value>
+  </property>
+  <property>
+    <name>fs.swift.service.${PROVIDER}.password</name>
+    <value>tajo_password</value>
+  </property>
+  <property>
+    <name>fs.swift.service.${PROVIDER}.location-aware</name>
+    <value>true</value>
+    <description>Flag to enable the location-aware computing</description>
+  </property>
+
+======================
+Tajo configuration
+======================
+
+Finally, you need to configure the classpath of Tajo by adding the following line to ``${TAJO_HOME}/conf/tajo-evn.sh``.
+
+.. code-block:: sh
+
+  export TAJO_CLASSPATH=$HADOOP_HOME/share/hadoop/tools/lib/hadoop-openstack-x.x.x.jar
+
+======================
+Querying on Swift
+======================
+
+Given a provider name *tajo* and a Swift container name *demo*, you can create a Tajo table with data on Swift as follows.
+
+.. code-block:: sql
+
+  default> create external table swift_table (id int32, name text, score float, type text) using text with ('text.delimiter'='|') location 'swift://demo.tajo/test.tbl';
+
+Once a table is created, you can execute any SQL queries on that table as other tables stored on HDFS.
+For query execution details, please refer to :doc:`sql_language`.
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/table_management.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/table_management.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/table_management.txt (added)
+++ tajo/site/docs/0.11.1/_sources/table_management.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,13 @@
+****************
+Table Management
+****************
+
+In Tajo, a table is a logical view of one data sources. Logically, one table consists of a logical schema, partitions, URL, and various properties. Physically, A table can be a directory in HDFS, a single file, one HBase table, or a RDBMS table. In order to make good use of Tajo, users need to understand features and physical characteristics of their physical layout. This section explains all about table management.
+
+.. toctree::
+    :maxdepth: 1
+
+    table_management/table_overview
+    table_management/tablespaces
+    table_management/data_formats
+    table_management/compression

Added: tajo/site/docs/0.11.1/_sources/table_management/compression.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/table_management/compression.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/table_management/compression.txt (added)
+++ tajo/site/docs/0.11.1/_sources/table_management/compression.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,22 @@
+***********
+Compression
+***********
+
+Using compression can make data size compact, thereby enabling efficient use of network bandwidth and storage. Most of Tajo data formats support data compression feature.
+Currently, compression configuration affects only for stored data format and it is enabled when a table is created with the proper table property(See `Create Table <../sql_language/ddl.html#create-table>`_).
+
+===========================================
+Compression Properties for each Data Format
+===========================================
+
+ .. csv-table:: Compression Properties
+
+  **Data Format**,**Property Name**,**Avaliable Values**
+  :doc:`text</table_management/text>`/:doc:`json</table_management/json>`/:doc:`rcfile</table_management/rcfile>`/:doc:`sequencefile</table_management/sequencefile>` [#f1]_,compression.codec,Fully Qualified Classname in Hadoop [#f2]_
+  :doc:`parquet</table_management/parquet>`,parquet.compression,uncompressed/snappy/gzip/lzo
+  :doc:`orc</table_management/orc>`,orc.compression.kind,none/snappy/zlib
+
+.. rubric:: Footnotes
+
+.. [#f1] For sequence file, you should specify 'compression.type' in addition to 'compression.codec'. Refer to :doc:`/table_management/sequencefile`.
+.. [#f2] All classes are available if they implement `org.apache.hadoop.io.compress.CompressionCodec <https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html>`_.

Added: tajo/site/docs/0.11.1/_sources/table_management/data_formats.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/table_management/data_formats.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/table_management/data_formats.txt (added)
+++ tajo/site/docs/0.11.1/_sources/table_management/data_formats.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,15 @@
+************
+Data Formats
+************
+
+Currently, Tajo provides following data formats:
+
+.. toctree::
+    :maxdepth: 1
+
+    text
+    json
+    rcfile
+    parquet
+    orc
+    sequencefile
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/table_management/json.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/table_management/json.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/table_management/json.txt (added)
+++ tajo/site/docs/0.11.1/_sources/table_management/json.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,100 @@
+****
+JSON
+****
+
+JSON(JavaScript Object Notation) is an open standard format for data (de)serialization. Since it is simple and human-readable, it is popularly used in many fields.
+Tajo supports JSON as its data format. In this section, you will get an overview of how to create JSON tables and query on them.
+
+============================
+How to Create a JSON Table ?
+============================
+
+You can create a JSON table using the ``CREATE TABLE`` statement. (For more information, please refer to :doc:`/sql_language/ddl`.)
+For example, please consider an example data as follows:
+
+.. code-block:: bash
+
+  $ hdfs dfs -cat /table1/table.json
+  { "title" : "Hand of the King", "name" : { "first_name": "Eddard", "last_name": "Stark"}}
+  { "title" : "Assassin", "name" : { "first_name": "Arya", "last_name": "Stark"}}
+  { "title" : "Dancing Master", "name" : { "first_name": "Syrio", "last_name": "Forel"}}
+
+Tajo provides two ways to create a table for this data. First is a traditional way to create tables. Here is an example.
+
+.. code-block:: sql
+
+  CREATE EXTERNAL TABLE table1 (
+    title TEXT,
+    name RECORD (
+      first_name TEXT,
+      last_name TEXT
+    )
+  ) USING JSON LOCATION '/table1/table.json';
+
+With this way, you need to specify every column which they want to use. This will be a tedious work, and not appropriate for flexible JSON schema.
+Second is a simpler alternative to alleviate this problem. When you create an external table of JSON format, you can simply omit the column specification as follows:
+
+.. code-block:: sql
+
+  CREATE EXTERNAL TABLE table1 (*) USING JSON LOCATION '/table1/table.json';
+
+No matter which way you choose, you can submit any queries on this table.
+
+.. code-block:: sql
+
+  > SELECT title, name.last_name from table1 where name.first_name = 'Arya';
+  title,name/last_name
+  -------------------------------
+  Assassin,Stark
+
+.. warning::
+
+  If you create a table with the second way, every column is assumed as the ``TEXT`` type.
+  So, you need to perform type casting if you want to handle them as other types.
+
+===================
+Physical Properties
+===================
+
+Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters.
+The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters.
+
+The JSON format provides the following physical properties.
+
+* ``text.delimiter``: delimiter character. ``|`` or ``\u0001`` is usually used, and the default field delimiter is ``|``.
+* ``text.null``: ``NULL`` character. The default ``NULL`` character is an empty string ``''``. Hive's default ``NULL`` character is ``'\\N'``.
+* ``compression.codec``: Compression codec. You can enable compression feature and set specified compression algorithm. The compression algorithm used to compress files. The compression codec name should be the fully qualified class name inherited from `org.apache.hadoop.io.compress.CompressionCodec <https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html>`_. By default, compression is disabled.
+* ``timezone``: the time zone that the table uses for writting. When table rows are read or written, ```timestamp``` and ```time``` column values are adjusted by this timezone if it is set. Time zone can be an abbreviation form like 'PST' or 'DST'. Also, it accepts an offset-based form like 'UTC+9' or a location-based form like 'Asia/Seoul'.
+* ``text.error-tolerance.max-num``: the maximum number of permissible parsing errors. This value should be an integer value. By default, ``text.error-tolerance.max-num`` is ``0``. According to the value, parsing errors will be handled in different ways.
+
+  * If ``text.error-tolerance.max-num < 0``, all parsing errors are ignored.
+  * If ``text.error-tolerance.max-num == 0``, any parsing error is not allowed. If any error occurs, the query will be failed. (default)
+  * If ``text.error-tolerance.max-num > 0``, the given number of parsing errors in each task will be pemissible.
+
+* ``text.skip.headerlines``: Number of header lines to be skipped. Some text files often have a header which has a kind of metadata(e.g.: column names), thus this option can be useful.
+
+The following example is to set a custom field delimiter, ``NULL`` character, and compression codec:
+
+.. code-block:: sql
+
+  CREATE TABLE table1 (
+    id int,
+    name text,
+    score float,
+    type text
+  ) USING JSON WITH('text.delimiter'='\u0001',
+                    'text.null'='\\N',
+                    'compression.codec'='org.apache.hadoop.io.compress.SnappyCodec');
+
+.. warning::
+
+  Be careful when using ``\n`` as the field delimiter because *TEXT* format tables use ``\n`` as the line delimiter.
+  At the moment, Tajo does not provide a way to specify the line delimiter.
+
+==========================
+Null Value Handling Issues
+==========================
+In default, ``NULL`` character in *TEXT* format is an empty string ``''``.
+In other words, an empty field is basically recognized as a ``NULL`` value in Tajo.
+If a field domain is ``TEXT``, an empty field is recognized as a string value ``''`` instead of ``NULL`` value.
+Besides, You can also use your own ``NULL`` character by specifying a physical property ``text.null``.

Added: tajo/site/docs/0.11.1/_sources/table_management/orc.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/table_management/orc.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/table_management/orc.txt (added)
+++ tajo/site/docs/0.11.1/_sources/table_management/orc.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,47 @@
+***
+ORC
+***
+
+**ORC(Optimized Row Columnar)** is a columnar storage format from Hive. ORC improves performance for reading,
+writing, and processing data.
+For more details, please refer to `ORC Files <https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ORC>`_ at Hive wiki.
+
+===========================
+How to Create an ORC Table?
+===========================
+
+If you are not familiar with ``CREATE TABLE`` statement, please refer to Data Definition Language :doc:`/sql_language/ddl`.
+
+In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE``
+statement. Below is an example statement for creating a table using orc files.
+
+.. code-block:: sql
+
+  CREATE TABLE table1 (
+    id int,
+    name text,
+    score float,
+    type text
+  ) USING orc;
+
+===================
+Physical Properties
+===================
+
+Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters.
+The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters.
+
+Now, ORC file provides the following physical properties.
+
+* ``orc.max.merge.distance``: When ORC file is read, if stripes are too closer and the distance is lower than this value, they are merged and read at once. Default is 1MB.
+* ``orc.stripe.size``: It decides size of each stripe. Default is 64MB.
+* ``orc.compression.kind``: It means the compression algorithm used to compress and write data. It should be one of ``none``, ``snappy``, ``zlib``. Default is ``none``.
+* ``orc.buffer.size``: It decides size of writing buffer. Default is 256KB.
+* ``orc.rowindex.stride``: Define the default ORC index stride in number of rows. (Stride is the number of rows an index entry represents.) Default is 10000.
+
+======================================
+Compatibility Issues with Apache Hive™
+======================================
+
+At the moment, Tajo only supports flat relational tables.
+We are currently working on adding support for nested schemas and non-scalar types (`TAJO-710 <https://issues.apache.org/jira/browse/TAJO-710>`_).
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/table_management/parquet.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/table_management/parquet.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/table_management/parquet.txt (added)
+++ tajo/site/docs/0.11.1/_sources/table_management/parquet.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,48 @@
+*************************************
+Parquet
+*************************************
+
+Parquet is a columnar storage format for Hadoop. Parquet is designed to make the advantages of compressed,
+efficient columnar data representation available to any project in the Hadoop ecosystem,
+regardless of the choice of data processing framework, data model, or programming language.
+For more details, please refer to `Parquet File Format <http://parquet.io/>`_.
+
+=========================================
+How to Create a Parquet Table?
+=========================================
+
+If you are not familiar with ``CREATE TABLE`` statement, please refer to Data Definition Language :doc:`/sql_language/ddl`.
+
+In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE``
+statement. Below is an example statement for creating a table using parquet files.
+
+.. code-block:: sql
+
+  CREATE TABLE table1 (
+    id int,
+    name text,
+    score float,
+    type text
+  ) USING PARQUET;
+
+=========================================
+Physical Properties
+=========================================
+
+Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters.
+The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters.
+
+Now, Parquet file provides the following physical properties.
+
+* ``parquet.block.size``: The block size is the size of a row group being buffered in memory. This limits the memory usage when writing. Larger values will improve the I/O when reading but consume more memory when writing. Default size is 134217728 bytes (= 128 * 1024 * 1024).
+* ``parquet.page.size``: The page size is for compression. When reading, each page can be decompressed independently. A block is composed of pages. The page is the smallest unit that must be read fully to access a single record. If this value is too small, the compression will deteriorate. Default size is 1048576 bytes (= 1 * 1024 * 1024).
+* ``parquet.compression``: The compression algorithm used to compress pages. It should be one of ``uncompressed``, ``snappy``, ``gzip``, ``lzo``. Default is ``uncompressed``.
+* ``parquet.enable.dictionary``: The boolean value is to enable/disable dictionary encoding. It should be one of either ``true`` or ``false``. Default is ``true``.
+
+=========================================
+Compatibility Issues with Apache Hive™
+=========================================
+
+At the moment, Tajo only supports flat relational tables.
+As a result, Tajo's Parquet storage type does not support nested schemas.
+However, we are currently working on adding support for nested schemas and non-scalar types (`TAJO-710 <https://issues.apache.org/jira/browse/TAJO-710>`_).
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/table_management/rcfile.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/table_management/rcfile.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/table_management/rcfile.txt (added)
+++ tajo/site/docs/0.11.1/_sources/table_management/rcfile.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,149 @@
+*************************************
+RCFile
+*************************************
+
+RCFile, short of Record Columnar File, are flat files consisting of binary key/value pairs,
+which shares many similarities with SequenceFile.
+
+=========================================
+How to Create a RCFile Table?
+=========================================
+
+If you are not familiar with the ``CREATE TABLE`` statement, please refer to the Data Definition Language :doc:`/sql_language/ddl`.
+
+In order to specify a certain file format for your table, you need to use the ``USING`` clause in your ``CREATE TABLE``
+statement. Below is an example statement for creating a table using RCFile.
+
+.. code-block:: sql
+
+  CREATE TABLE table1 (
+    id int,
+    name text,
+    score float,
+    type text
+  ) USING RCFILE;
+
+=========================================
+Physical Properties
+=========================================
+
+Some table storage formats provide parameters for enabling or disabling features and adjusting physical parameters.
+The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters.
+
+Now, the RCFile storage type provides the following physical properties.
+
+* ``rcfile.serde`` : custom (De)serializer class. ``org.apache.tajo.storage.BinarySerializerDeserializer`` is the default (de)serializer class.
+* ``rcfile.null`` : NULL character. It is only used when a table uses ``org.apache.tajo.storage.TextSerializerDeserializer``. The default NULL character is an empty string ``''``. Hive's default NULL character is ``'\\N'``.
+* ``compression.codec`` : Compression codec. You can enable compression feature and set specified compression algorithm. The compression algorithm used to compress files. The compression codec name should be the fully qualified class name inherited from `org.apache.hadoop.io.compress.CompressionCodec <https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html>`_. By default, compression is disabled.
+
+The following is an example for creating a table using RCFile that uses compression.
+
+.. code-block:: sql
+
+  CREATE TABLE table1 (
+    id int,
+    name text,
+    score float,
+    type text
+  ) USING RCFILE WITH ('compression.codec'='org.apache.hadoop.io.compress.SnappyCodec');
+
+=========================================
+RCFile (De)serializers
+=========================================
+
+Tajo provides two built-in (De)serializer for RCFile:
+
+* ``org.apache.tajo.storage.TextSerializerDeserializer``: stores column values in a plain-text form.
+* ``org.apache.tajo.storage.BinarySerializerDeserializer``: stores column values in a binary file format.
+
+The RCFile format can store some metadata in the RCFile header. Tajo writes the (de)serializer class name into
+the metadata header of each RCFile when the RCFile is created in Tajo.
+
+.. note::
+
+  ``org.apache.tajo.storage.BinarySerializerDeserializer`` is the default (de) serializer for RCFile.
+
+
+=========================================
+Compatibility Issues with Apache Hive™
+=========================================
+
+Regardless of whether the RCFiles are written by Apache Hive™ or Apache Tajo™, the files are compatible in both systems.
+In other words, Tajo can process RCFiles written by Apache Hive and vice versa.
+
+Since there are no metadata in RCFiles written by Hive, we need to manually specify the (de)serializer class name
+by setting a physical property.
+
+In Hive, there are two SerDe, and they correspond to the following (de)serializer in Tajo.
+
+* ``org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe``: corresponds to ``TextSerializerDeserializer`` in Tajo.
+* ``org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe``: corresponds to ``BinarySerializerDeserializer`` in Tajo.
+
+The compatibility issue mostly occurs when a user creates an external table pointing to data of an existing table.
+The following section explains two cases: 1) the case where Tajo reads RCFile written by Hive, and
+2) the case where Hive reads RCFile written by Tajo.
+
+-----------------------------------------
+When Tajo reads RCFile generated in Hive
+-----------------------------------------
+
+To create an external RCFile table generated with ``ColumnarSerDe`` in Hive,
+you should set the physical property ``rcfile.serde`` in Tajo as follows:
+
+.. code-block:: sql
+
+  CREATE EXTERNAL TABLE table1 (
+    id int,
+    name text,
+    score float,
+    type text
+  ) USING RCFILE with ( 'rcfile.serde'='org.apache.tajo.storage.TextSerializerDeserializer', 'rcfile.null'='\\N' )
+  LOCATION '....';
+
+To create an external RCFile table generated with ``LazyBinaryColumnarSerDe`` in Hive,
+you should set the physical property ``rcfile.serde`` in Tajo as follows:
+
+.. code-block:: sql
+
+  CREATE EXTERNAL TABLE table1 (
+    id int,
+    name text,
+    score float,
+    type text
+  ) USING RCFILE WITH ('rcfile.serde' = 'org.apache.tajo.storage.BinarySerializerDeserializer')
+  LOCATION '....';
+
+.. note::
+
+  As we mentioned above, ``BinarySerializerDeserializer`` is the default (de) serializer for RCFile.
+  So, you can omit the ``rcfile.serde`` only for ``org.apache.tajo.storage.BinarySerializerDeserializer``.
+
+-----------------------------------------
+When Hive reads RCFile generated in Tajo
+-----------------------------------------
+
+To create an external RCFile table written by Tajo with ``TextSerializerDeserializer``,
+you should set the ``SERDE`` as follows:
+
+.. code-block:: sql
+
+  CREATE TABLE table1 (
+    id int,
+    name string,
+    score float,
+    type string
+  ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe' STORED AS RCFILE
+  LOCATION '<hdfs_location>';
+
+To create an external RCFile table written by Tajo with ``BinarySerializerDeserializer``,
+you should set the ``SERDE`` as follows:
+
+.. code-block:: sql
+
+  CREATE TABLE table1 (
+    id int,
+    name string,
+    score float,
+    type string
+  ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.columnar.LazyBinaryColumnarSerDe' STORED AS RCFILE
+  LOCATION '<hdfs_location>';
\ No newline at end of file

Added: tajo/site/docs/0.11.1/_sources/table_management/sequencefile.txt
URL: http://svn.apache.org/viewvc/tajo/site/docs/0.11.1/_sources/table_management/sequencefile.txt?rev=1728394&view=auto
==============================================================================
--- tajo/site/docs/0.11.1/_sources/table_management/sequencefile.txt (added)
+++ tajo/site/docs/0.11.1/_sources/table_management/sequencefile.txt Thu Feb  4 00:29:05 2016
@@ -0,0 +1,111 @@
+*************************************
+SequenceFile
+*************************************
+
+-----------------------------------------
+Introduce
+-----------------------------------------
+
+SequenceFiles are flat files consisting of binary key/value pairs.
+SequenceFile is basic file format which provided by Hadoop, and Hive also provides it to create a table.
+
+The ``USING sequencefile`` keywords let you create a SequecneFile. Here is an example statement to create a table using ``SequecneFile``:
+
+.. code-block:: sql
+
+ CREATE TABLE table1 (id int, name text, score float, type text)
+ USING sequencefile;
+
+Also Tajo provides Hive compatibility for SequenceFile. The above statement can be written in Hive as follows:
+
+.. code-block:: sql
+
+ CREATE TABLE table1 (id int, name string, score float, type string)
+ STORED AS sequencefile;
+
+-----------------------------------------
+SerializerDeserializer (SerDe)
+-----------------------------------------
+
+There are two SerDe for SequenceFile as follows:
+
+ + TextSerializerDeserializer: This class can read and write data in plain text file format.
+ + BinarySerializerDeserializer: This class can read and write data in binary file format.
+
+The default is the SerDe for plain text file in Tajo. The above example statement created the table using TextSerializerDeserializer.If you want to use BinarySerializerDeserializer, you can specify it by ``sequencefile.serde`` keywords:
+
+.. code-block:: sql
+
+ CREATE TABLE table1 (id int, name text, score float, type text)
+ USING sequencefile with ('sequencefile.serde'='org.apache.tajo.storage.BinarySerializerDeserializer')
+
+In Hive, the above statement can be written in Hive as follows:
+
+.. code-block:: sql
+
+ CREATE TABLE table1 (id int, name string, score float, type string)
+ ROW FORMAT SERDE
+  'org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe'
+ STORED AS sequencefile;
+
+-----------------------------------------
+Writer
+-----------------------------------------
+
+There are three SequenceFile Writers based on the SequenceFile.CompressionType used to compress key/value pairs:
+
+ + Writer : Uncompressed records.
+ + RecordCompressWriter : Record-compressed files, only compress values.
+ + BlockCompressWriter : Block-compressed files, both keys & values are collected in 'blocks' separately and compressed. The size of the 'block' is configurable.
+
+The default is Uncompressed Writer in Tajo. If you want to use RecordCompressWriter, you can specify it by ``compression.type`` keywords and  ``compression.codec`` keywords:
+
+.. code-block:: sql
+
+ CREATE TABLE table1 (id int, name text, score float, type text)
+ USING sequencefile with ('compression.type'='RECORD','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec')
+
+In hive, you need to specify settings as follows:
+
+.. code-block:: sql
+
+ hive> SET hive.exec.compress.output = true;
+ hive> SET mapred.output.compression.type = RECORD;
+ hive> SET mapred.output.compression.codec = org.apache.hadoop.io.compress.SnappyCodec;
+ hive> CREATE TABLE table1 (id int, name string, score float, type string) STORED AS sequencefile;;
+
+And if you want to use BlockCompressWriter, you can specify it by ``compression.type`` keywords and  ``compression.codec`` keywords:
+
+.. code-block:: sql
+
+ CREATE TABLE table1 (id int, name text, score float, type text)
+ USING sequencefile with ('compression.type'='BLOCK','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec')
+
+In hive, you need to specify settings as follows:
+
+.. code-block:: sql
+
+ hive> SET hive.exec.compress.output = true;
+ hive> SET mapred.output.compression.type = BLOCK;
+ hive> SET mapred.output.compression.codec = org.apache.hadoop.io.compress.SnappyCodec;
+ hive> CREATE TABLE table1 (id int, name string, score float, type string) STORED AS sequencefile;;
+
+For reference, you can use TextSerDe or BinarySerDe with compression keywords.
+Here is an example statement for this case.
+
+.. code-block:: sql
+
+ CREATE TABLE table1 (id int, name text, score float, type text)
+ USING sequencefile with ('sequencefile.serde'='org.apache.tajo.storage.BinarySerializerDeserializer', 'compression.type'='BLOCK','compression.codec'='org.apache.hadoop.io.compress.SnappyCodec')
+
+In hive, you need to specify settings as follows:
+
+.. code-block:: sql
+
+ hive> SET hive.exec.compress.output = true;
+ hive> SET mapred.output.compression.type = BLOCK;
+ hive> SET mapred.output.compression.codec = org.apache.hadoop.io.compress.SnappyCodec;
+ hive> CREATE TABLE table1 (id int, name string, score float, type string)
+       ROW FORMAT SERDE
+         'org.apache.hadoop.hive.serde2.lazybinary.LazyBinarySerDe'
+       STORED AS sequencefile;;
\ No newline at end of file




Mime
View raw message