tajo-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From jihoon...@apache.org
Subject tajo git commit: TAJO-1462: Replace CSV examples into TEXT examples in docs.
Date Sat, 04 Apr 2015 09:59:14 GMT
Repository: tajo
Updated Branches:
  refs/heads/master 70d5fdf86 -> b0abff8e8


TAJO-1462: Replace CSV examples into TEXT examples in docs.

Closes #475

Signed-off-by: Jihoon Son <jihoonson@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/tajo/repo
Commit: http://git-wip-us.apache.org/repos/asf/tajo/commit/b0abff8e
Tree: http://git-wip-us.apache.org/repos/asf/tajo/tree/b0abff8e
Diff: http://git-wip-us.apache.org/repos/asf/tajo/diff/b0abff8e

Branch: refs/heads/master
Commit: b0abff8e896d7985eaf1aa48d9c2ab3a45618f01
Parents: 70d5fdf
Author: Dongjoon Hyun <dongjoon@apache.org>
Authored: Sat Apr 4 18:58:38 2015 +0900
Committer: Jihoon Son <jihoonson@apache.org>
Committed: Sat Apr 4 18:58:38 2015 +0900

----------------------------------------------------------------------
 CHANGES                                         |   3 +
 .../main/sphinx/backup_and_restore/catalog.rst  |   2 +-
 tajo-docs/src/main/sphinx/getting_started.rst   |   2 +-
 tajo-docs/src/main/sphinx/sql_language/ddl.rst  |   2 +-
 .../src/main/sphinx/table_management/csv.rst    | 115 -------------------
 .../sphinx/table_management/file_formats.rst    |   2 +-
 .../sphinx/table_management/table_overview.rst  |   6 +-
 .../src/main/sphinx/table_management/text.rst   | 115 +++++++++++++++++++
 8 files changed, 125 insertions(+), 122 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/CHANGES
----------------------------------------------------------------------
diff --git a/CHANGES b/CHANGES
index 5ee7715..3def16c 100644
--- a/CHANGES
+++ b/CHANGES
@@ -124,6 +124,9 @@ Release 0.11.0 - unreleased
   
   TASKS
 
+    TAJO-1462: Replace CSV examples into TEXT examples in docs. 
+    (Contributed by Dongjoon Hyun, Committed by jihoon)
+
     TAJO-1424: Investigate the problem of too many "Try to connect" messeges 
     during Travic CI build. (Contributed by navis, Committed by jihoon)
 

http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/tajo-docs/src/main/sphinx/backup_and_restore/catalog.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/backup_and_restore/catalog.rst b/tajo-docs/src/main/sphinx/backup_and_restore/catalog.rst
index 200aa85..1c2b709 100644
--- a/tajo-docs/src/main/sphinx/backup_and_restore/catalog.rst
+++ b/tajo-docs/src/main/sphinx/backup_and_restore/catalog.rst
@@ -28,7 +28,7 @@ For example, if you want to backup a table customer, you should type a command
a
   -- Name: customer; Type: TABLE; Storage: CSV
   -- Path: file:/home/hyunsik/tpch/customer
   --
-  CREATE EXTERNAL TABLE customer (c_custkey INT8, c_name TEXT, c_address TEXT, c_nationkey
INT8, c_phone TEXT, c_acctbal FLOAT8, c_mktsegment TEXT, c_comment TEXT) USING CSV LOCATION
'file:/home/hyunsik/tpch/customer';
+  CREATE EXTERNAL TABLE customer (c_custkey INT8, c_name TEXT, c_address TEXT, c_nationkey
INT8, c_phone TEXT, c_acctbal FLOAT8, c_mktsegment TEXT, c_comment TEXT) USING TEXT LOCATION
'file:/home/hyunsik/tpch/customer';
   
 
 If you want to restore the catalog from the SQL dump file, please type the below command:
::

http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/tajo-docs/src/main/sphinx/getting_started.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/getting_started.rst b/tajo-docs/src/main/sphinx/getting_started.rst
index eaf6973..e30c3fe 100644
--- a/tajo-docs/src/main/sphinx/getting_started.rst
+++ b/tajo-docs/src/main/sphinx/getting_started.rst
@@ -135,7 +135,7 @@ Here, we assume the schema as (int, text, float, text). ::
         name text, 
         score float, 
         type text) 
-        using csv with ('text.delimiter'='|') location 'file:/home/x/table1';
+        using text with ('text.delimiter'='|') location 'file:/home/x/table1';
 
 To load an external table, you need to use ‘create external table’ statement. 
 In the location clause, you should use the absolute directory path with an appropriate scheme.


http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/tajo-docs/src/main/sphinx/sql_language/ddl.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/sql_language/ddl.rst b/tajo-docs/src/main/sphinx/sql_language/ddl.rst
index 60b7190..662ccff 100644
--- a/tajo-docs/src/main/sphinx/sql_language/ddl.rst
+++ b/tajo-docs/src/main/sphinx/sql_language/ddl.rst
@@ -56,7 +56,7 @@ If you want to add an external table that contains compressed data, you
should g
   ...
   L_COMMENT text) 
 
-  USING csv WITH ('text.delimiter'='|','compression.codec'='org.apache.hadoop.io.compress.DeflateCodec')
+  USING TEXT WITH ('text.delimiter'='|','compression.codec'='org.apache.hadoop.io.compress.DeflateCodec')
   LOCATION 'hdfs://localhost:9010/tajo/warehouse/lineitem_100_snappy';
 
 `compression.codec` parameter can have one of the following compression codecs:

http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/tajo-docs/src/main/sphinx/table_management/csv.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/table_management/csv.rst b/tajo-docs/src/main/sphinx/table_management/csv.rst
deleted file mode 100644
index 53c6e1d..0000000
--- a/tajo-docs/src/main/sphinx/table_management/csv.rst
+++ /dev/null
@@ -1,115 +0,0 @@
-*************************************
-CSV (TextFile)
-*************************************
-
-A character-separated values (CSV) file represents a tabular data set consisting of rows
and columns.
-Each row is a plan-text line. A line is usually broken by a character line feed ``\n`` or
carriage-return ``\r``.
-The line feed ``\n`` is the default delimiter in Tajo. Each record consists of multiple fields,
separated by
-some other character or string, most commonly a literal vertical bar ``|``, comma ``,`` or
tab ``\t``.
-The vertical bar is used as the default field delimiter in Tajo.
-
-=========================================
-How to Create a CSV Table ?
-=========================================
-
-If you are not familiar with the ``CREATE TABLE`` statement, please refer to the Data Definition
Language :doc:`/sql_language/ddl`.
-
-In order to specify a certain file format for your table, you need to use the ``USING`` clause
in your ``CREATE TABLE``
-statement. The below is an example statement for creating a table using CSV files.
-
-.. code-block:: sql
-
- CREATE TABLE
-  table1 (
-    id int,
-    name text,
-    score float,
-    type text
-  ) USING CSV;
-
-=========================================
-Physical Properties
-=========================================
-
-Some table storage formats provide parameters for enabling or disabling features and adjusting
physical parameters.
-The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters.
-
-Now, the CSV storage format provides the following physical properties.
-
-* ``text.delimiter``: delimiter character. ``|`` or ``\u0001`` is usually used, and the default
field delimiter is ``|``.
-* ``text.null``: NULL character. The default NULL character is an empty string ``''``. Hive's
default NULL character is ``'\\N'``.
-* ``compression.codec``: Compression codec. You can enable compression feature and set specified
compression algorithm. The compression algorithm used to compress files. The compression codec
name should be the fully qualified class name inherited from `org.apache.hadoop.io.compress.CompressionCodec
<https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html>`_.
By default, compression is disabled.
-* ``csvfile.serde`` (deprecated): custom (De)serializer class. ``org.apache.tajo.storage.TextSerializerDeserializer``
is the default (De)serializer class.
-* ``timezone``: the time zone that the table uses for writting. When table rows are read
or written, ```timestamp``` and ```time``` column values are adjusted by this timezone if
it is set. Time zone can be an abbreviation form like 'PST' or 'DST'. Also, it accepts an
offset-based form like 'UTC+9' or a location-based form like 'Asia/Seoul'.
-* ``text.error-tolerance.max-num``: the maximum number of permissible parsing errors. This
value should be an integer value. By default, ``text.error-tolerance.max-num`` is ``0``. According
to the value, parsing errors will be handled in different ways.
-  * If ``text.error-tolerance.max-num < 0``, all parsing errors are ignored.
-  * If ``text.error-tolerance.max-num == 0``, any parsing error is not allowed. If any error
occurs, the query will be failed. (default)
-  * If ``text.error-tolerance.max-num > 0``, the given number of parsing errors in each
task will be pemissible.
-
-The following example is to set a custom field delimiter, NULL character, and compression
codec:
-
-.. code-block:: sql
-
- CREATE TABLE table1 (
-  id int,
-  name text,
-  score float,
-  type text
- ) USING CSV WITH('text.delimiter'='\u0001',
-                  'text.null'='\\N',
-                  'compression.codec'='org.apache.hadoop.io.compress.SnappyCodec');
-
-.. warning::
-
-  Be careful when using ``\n`` as the field delimiter because CSV uses ``\n`` as the line
delimiter.
-  At the moment, Tajo does not provide a way to specify the line delimiter.
-
-=========================================
-Custom (De)serializer
-=========================================
-
-The CSV storage format not only provides reading and writing interfaces for CSV data but
also allows users to process custom
-plan-text file formats with user-defined (De)serializer classes.
-For example, with custom (de)serializers, Tajo can process JSON file formats or any specialized
plan-text file formats.
-
-In order to specify a custom (De)serializer, set a physical property ``csvfile.serde``.
-The property value should be a fully qualified class name.
-
-For example:
-
-.. code-block:: sql
-
- CREATE TABLE table1 (
-  id int,
-  name text,
-  score float,
-  type text
- ) USING CSV WITH ('csvfile.serde'='org.my.storage.CustomSerializerDeserializer')
-
-
-=========================================
-Null Value Handling Issues
-=========================================
-In default, NULL character in CSV files is an empty string ``''``.
-In other words, an empty field is basically recognized as a NULL value in Tajo.
-If a field domain is ``TEXT``, an empty field is recognized as a string value ``''`` instead
of NULL value.
-Besides, You can also use your own NULL character by specifying a physical property ``text.null``.
-
-=========================================
-Compatibility Issues with Apache Hive™
-=========================================
-
-CSV files generated in Tajo can be processed directly by Apache Hive™ without further processing.
-In this section, we explain some compatibility issue for users who use both Hive and Tajo.
-
-If you set a custom field delimiter, the CSV tables cannot be directly used in Hive.
-In order to specify the custom field delimiter in Hive, you need to use ``ROW FORMAT DELIMITED
FIELDS TERMINATED BY``
-clause in a Hive's ``CREATE TABLE`` statement as follows:
-
-.. code-block:: sql
-
- CREATE TABLE table1 (id int, name string, score float, type string)
- ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
- STORED AS TEXT
-
-To the best of our knowledge, there is not way to specify a custom NULL character in Hive.

http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/tajo-docs/src/main/sphinx/table_management/file_formats.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/table_management/file_formats.rst b/tajo-docs/src/main/sphinx/table_management/file_formats.rst
index c15dd3f..0579497 100644
--- a/tajo-docs/src/main/sphinx/table_management/file_formats.rst
+++ b/tajo-docs/src/main/sphinx/table_management/file_formats.rst
@@ -7,7 +7,7 @@ Currently, Tajo provides four file formats as follows:
 .. toctree::
     :maxdepth: 1
 
-    csv
+    text
     rcfile
     parquet
     sequencefile
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/tajo-docs/src/main/sphinx/table_management/table_overview.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/table_management/table_overview.rst b/tajo-docs/src/main/sphinx/table_management/table_overview.rst
index 3d933c2..3985e19 100644
--- a/tajo-docs/src/main/sphinx/table_management/table_overview.rst
+++ b/tajo-docs/src/main/sphinx/table_management/table_overview.rst
@@ -29,9 +29,9 @@ The following example is to set a custom field delimiter, NULL character,
and co
   name text,
   score float,
   type text
- ) USING CSV WITH('text.delimiter'='\u0001',
-                  'text.null'='\\N',
-                  'compression.codec'='org.apache.hadoop.io.compress.SnappyCodec');
+ ) USING TEXT WITH('text.delimiter'='\u0001',
+                   'text.null'='\\N',
+                   'compression.codec'='org.apache.hadoop.io.compress.SnappyCodec');
 
 Each physical table layout has its own specialized properties. They will be addressed in
:doc:`/table_management/file_formats`.
 

http://git-wip-us.apache.org/repos/asf/tajo/blob/b0abff8e/tajo-docs/src/main/sphinx/table_management/text.rst
----------------------------------------------------------------------
diff --git a/tajo-docs/src/main/sphinx/table_management/text.rst b/tajo-docs/src/main/sphinx/table_management/text.rst
new file mode 100644
index 0000000..3727b03
--- /dev/null
+++ b/tajo-docs/src/main/sphinx/table_management/text.rst
@@ -0,0 +1,115 @@
+*************************************
+TEXT
+*************************************
+
+A character-separated values plain-text file represents a tabular data set consisting of
rows and columns.
+Each row is a plan-text line. A line is usually broken by a character line feed ``\n`` or
carriage-return ``\r``.
+The line feed ``\n`` is the default delimiter in Tajo. Each record consists of multiple fields,
separated by
+some other character or string, most commonly a literal vertical bar ``|``, comma ``,`` or
tab ``\t``.
+The vertical bar is used as the default field delimiter in Tajo.
+
+=========================================
+How to Create a TEXT Table ?
+=========================================
+
+If you are not familiar with the ``CREATE TABLE`` statement, please refer to the Data Definition
Language :doc:`/sql_language/ddl`.
+
+In order to specify a certain file format for your table, you need to use the ``USING`` clause
in your ``CREATE TABLE``
+statement. The below is an example statement for creating a table using *TEXT* format.
+
+.. code-block:: sql
+
+ CREATE TABLE
+  table1 (
+    id int,
+    name text,
+    score float,
+    type text
+  ) USING TEXT;
+
+=========================================
+Physical Properties
+=========================================
+
+Some table storage formats provide parameters for enabling or disabling features and adjusting
physical parameters.
+The ``WITH`` clause in the CREATE TABLE statement allows users to set those parameters.
+
+*TEXT* format provides the following physical properties.
+
+* ``text.delimiter``: delimiter character. ``|`` or ``\u0001`` is usually used, and the default
field delimiter is ``|``.
+* ``text.null``: ``NULL`` character. The default ``NULL`` character is an empty string ``''``.
Hive's default ``NULL`` character is ``'\\N'``.
+* ``compression.codec``: Compression codec. You can enable compression feature and set specified
compression algorithm. The compression algorithm used to compress files. The compression codec
name should be the fully qualified class name inherited from `org.apache.hadoop.io.compress.CompressionCodec
<https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html>`_.
By default, compression is disabled.
+* ``text.serde``: custom (De)serializer class. ``org.apache.tajo.storage.text.CSVLineSerDe``
is the default (De)serializer class.
+* ``timezone``: the time zone that the table uses for writting. When table rows are read
or written, ```timestamp``` and ```time``` column values are adjusted by this timezone if
it is set. Time zone can be an abbreviation form like 'PST' or 'DST'. Also, it accepts an
offset-based form like 'UTC+9' or a location-based form like 'Asia/Seoul'.
+* ``text.error-tolerance.max-num``: the maximum number of permissible parsing errors. This
value should be an integer value. By default, ``text.error-tolerance.max-num`` is ``0``. According
to the value, parsing errors will be handled in different ways.
+  * If ``text.error-tolerance.max-num < 0``, all parsing errors are ignored.
+  * If ``text.error-tolerance.max-num == 0``, any parsing error is not allowed. If any error
occurs, the query will be failed. (default)
+  * If ``text.error-tolerance.max-num > 0``, the given number of parsing errors in each
task will be pemissible.
+
+The following example is to set a custom field delimiter, ``NULL`` character, and compression
codec:
+
+.. code-block:: sql
+
+ CREATE TABLE table1 (
+  id int,
+  name text,
+  score float,
+  type text
+ ) USING TEXT WITH('text.delimiter'='\u0001',
+                   'text.null'='\\N',
+                   'compression.codec'='org.apache.hadoop.io.compress.SnappyCodec');
+
+.. warning::
+
+  Be careful when using ``\n`` as the field delimiter because *TEXT* format tables use ``\n``
as the line delimiter.
+  At the moment, Tajo does not provide a way to specify the line delimiter.
+
+=========================================
+Custom (De)serializer
+=========================================
+
+The *TEXT* format not only provides reading and writing interfaces for text data but also
allows users to process custom
+plan-text file formats with user-defined (De)serializer classes.
+For example, with custom (de)serializers, Tajo can process JSON file formats or any specialized
plan-text file formats.
+
+In order to specify a custom (De)serializer, set a physical property ``text.serde``.
+The property value should be a fully qualified class name.
+
+For example:
+
+.. code-block:: sql
+
+ CREATE TABLE table1 (
+  id int,
+  name text,
+  score float,
+  type text
+ ) USING TEXT WITH ('text.serde'='org.my.storage.CustomSerializerDeserializer')
+
+
+=========================================
+Null Value Handling Issues
+=========================================
+In default, ``NULL`` character in *TEXT* format is an empty string ``''``.
+In other words, an empty field is basically recognized as a ``NULL`` value in Tajo.
+If a field domain is ``TEXT``, an empty field is recognized as a string value ``''`` instead
of ``NULL`` value.
+Besides, You can also use your own ``NULL`` character by specifying a physical property ``text.null``.
+
+=========================================
+Compatibility Issues with Apache Hive™
+=========================================
+
+*TEXT* tables generated in Tajo can be processed directly by Apache Hive™ without further
processing.
+In this section, we explain some compatibility issue for users who use both Hive and Tajo.
+
+If you set a custom field delimiter, the *TEXT* tables cannot be directly used in Hive.
+In order to specify the custom field delimiter in Hive, you need to use ``ROW FORMAT DELIMITED
FIELDS TERMINATED BY``
+clause in a Hive's ``CREATE TABLE`` statement as follows:
+
+.. code-block:: sql
+
+ CREATE TABLE table1 (id int, name string, score float, type string)
+ ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
+ STORED AS TEXT
+
+To the best of our knowledge, there is not way to specify a custom ``NULL`` character in
Hive.


Mime
View raw message