This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/20060/

tajo-docs/src/main/sphinx/table_management/parquet.rst (Diff revision 1)
14
If you are not familiar with ``CREATE TABLE`` statement, please refer to Data Definition Language :doc:`/sql_language/ddl`.
Add a "the" before ``CREATE TABLE`` and before "Data Definition Language".

tajo-docs/src/main/sphinx/table_management/parquet.rst (Diff revision 1)
16
In order to specify a certain file format for your table, you need to use the ``USING`` clause in ``CREATE TABLE``
Add a "your" before ``CREATE TABLE``.

tajo-docs/src/main/sphinx/table_management/parquet.rst (Diff revision 1)
17
statement. The below is an example statement for creating a table using parquet files.
Remove "The" before "below".

tajo-docs/src/main/sphinx/table_management/parquet.rst (Diff revision 1)
47
So, Parquet file format implementation in Tajo cannot recognize nested schemas.
"As a result, Tajo's Parquet storage type does not support nested schemas."

tajo-docs/src/main/sphinx/table_management/parquet.rst (Diff revision 1)
48
Currently, nested schemas and non-scalar type support (`TAJO-710 <https://issues.apache.org/jira/browse/TAJO-710>`_) is working in progress.
"However, we are currently working on adding support for nested schemas and non-scalar types."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
2
RCFIle
2
RCFIle
The 'i' should be lower case.

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
5
RCFiles, short of Record Columnar File, are flat files consisting of binary key/value pairs, which shares much
I don't think RCFile should be plural.

"...shares many similarities..."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
12
If you are not familiar with ``CREATE TABLE`` statement, please refer to Data Definition Language :doc:`/sql_language/ddl`.
Add "the" before ``CREATE TABLE`` and before "Data Definition Language."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
14
In order to specify a certain file format for your table, you need to use the ``USING`` clause in ``CREATE TABLE``
Add a "your" before ``CREATE TABLE``.

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
15
statement. The below is an example statement for creating a table using rcfiles.
RCFile

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
33
Now, RCFile file provides the following physical properties.
"...the RCFile storage type..."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
35
* ``rcfile.serde`` : custom (De)serializer class. ``org.apache.tajo.storage.BinarySerializerDeserializer`` is a default (De)serializer class.
"...is the default (de)serializer class."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
37
* ``compression.codec`` : Compression codec. You can enable compression feature and set specified compression algorithm. A property value should be a full qualified class name inherited from `org.apache.hadoop.io.compress.CompressionCodec <https://hadoop.apache.org/docs/current/api/org/apache/hadoop/io/compress/CompressionCodec.html>`_. In default, compression is disabled.
"The compression algorithm used to compress files. The compression codec name should be the fully qualified class name..."

"By default, ..."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
39
The following is an example to set compression to a table using RCFile.
"...is an example for creating a table using RCFile that uses compression."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
57
* ``org.apache.tajo.storage.BinarySerializerDeserializer``: store column values in a binary data.
"stores column values in a binary file format."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
59
RCFile format can have some metadata stored in RCFile header. Tajo writes the (de)serializer class name into
The RCFile format can store some metadata in the RCFile header."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
64
  ``org.apache.tajo.storage.BinarySerializerDeserializer`` is a default (de) serializer for RCFile.
"... is the default..."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
71
In regardless of what RCFile is generated in either Apache Hive™ or Apache Tajo™, RCFiles are compatible in both systems.
"Regardless of whether the RCFiles are written by Apache..., the files are compatible..."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
72
In other words, Apache Tajo can process directly RCFiles generated in Apache Hive and vice versa.
...Tajo can process RCFiles written by Apache Hive and vice versa."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
74
Since there are no metadata in RCFiles generated in Hive, we need to specify manually (de)serializer class name
"...RCFiles written by Hive, we need to manually specify the (de)serializer class name..."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
77
In Hive, there are two (de)serializers, and they correspond to the following (de)serializer of Tajo.
"...in Tajo."

FYI, I think Hive uses the term "SerDe" instead of "(de)serializer." Perhaps we should use "SerDe" when referring to the Hive class but stay with "(de)serializer" when referring to the Tajo class?

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
82
The compatibility issue mostly occurs when a user creates an external table with already existing tables.
"...when a user creates an external table pointing to data of an existing table."

I understand what you're trying to say here. It's not an easy sentence to make clear. :)

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
83
The following section will explains two cases: 1) the case where Tajo reads RCFile generated in Hive, and
Remove "will".

I think it is better to use "written by" rather than "generated in."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
91
you should set a physical property ``rcfile.serde`` in Tajo as follows:
"...set the physical property..."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
104
you should set a physical property ``rcfile.serde`` in Tajo as follows:
"...set the physical property..."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
118
  As we mentioned above, ``BinarySerializerDeserializer`` is a default (de) serializer for RCFile.
"...the default..."

tajo-docs/src/main/sphinx/table_management/rcfile.rst (Diff revision 1)
119
  So, you can omit to set ``rcfile.serde`` only for ``org.apache.tajo.storage.BinarySerializerDeserializer``.
"...you can omit the ``rcfile.serde```..."

- David Chen


On April 5th, 2014, 5:28 p.m. UTC, Hyunsik Choi wrote:

Review request for Tajo.
By Hyunsik Choi.

Updated April 5, 2014, 5:28 p.m.

Bugs: TAJO-736
Repository: tajo

Description

Jinho and I wrote some user documentations for file formats. This patch contains documentations for CSV file, RCFile, and Parquet file.

Diffs

  • tajo-docs/src/main/sphinx/partitioning/column_partitioning.rst (e88d23ff110e0326a667b51c45d5897d37eb04bc)
  • tajo-docs/src/main/sphinx/partitioning/intro_to_partitioning.rst (bfb555f6a49b91adf89f7c7a94ad9226d641e0d0)
  • tajo-docs/src/main/sphinx/table_management/csv.rst (c11a34315196c276c35ae5eb94c2b66653383b8a)
  • tajo-docs/src/main/sphinx/table_management/parquet.rst (a994b7e4008017d45622c845a19f2884ed15bc8f)
  • tajo-docs/src/main/sphinx/table_management/rcfile.rst (21f825313a84cffa95e615206108f455817836e8)

View Diff