tajo-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TAJO-736) Add table management documentation
Date Fri, 04 Apr 2014 15:40:26 GMT

    [ https://issues.apache.org/jira/browse/TAJO-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960061#comment-13960061

David Chen commented on TAJO-736:

Not a problem! The documentation looks great so far. I have some minor comments:

For more details, please refer Parquet File Format.
Should be "please refer to the Parquet File Format."

If you are not familiar with CREATE TABLE statement, please refer Data Definition Language
Data Definition Language.
Similar to above, add a "to" after "refer." Also, "Data Definition Language" appears to be

In order to specify a certain file format for your table, you need to use USING PARQUET clause

Add a "the" after "use."

The below is an example statement for creating a table using PARQUET files. WITH clause allows
users to specify a set of physical properties.

I'm not sure whether PARQUET needs to be all-caps here. Also, add a "the" before "WITH."

Some table file formats provide special enable/disable features and the ways to adjust some
physical parameters. WITH clause in CREATE TABLE statement allows users to set those physical

I think it might be better to rephrase this as: "Some table storage formats provide parameters
for enabling or disabling features and adjusting physical parameters. The WITH clause in the
CREATE TABLE statement allows users to set those parameters."

Now, Parquet file provides the following physical properties.

"The Parquet storage format provides the following physical properties:"

Larger values will improve the IO when reading but consume more memory when writing.

IO should be I/O.

The compression algorithm used to compress pages. It should be one of UNCOMPRESSED, SNAPPY,

I believe the convention for the compression codec names from the Parquet documentation is
to use all lowercase (even though code-wise, ALL CAPS still works anyway :)).

Compatibility Issues

It might be possible that users might try to use Parquet files with nested schemas and non-scalar
types, which is currently a compatibility issue. Should we add a note that we are currently
working on adding support for nested schemas on non-scalar types?


If you would like, I would be glad to review the documentation for the other storage formats
too. Can you create a RB review for this patch? It might be easier to review on RB.


> Add table management documentation
> ----------------------------------
>                 Key: TAJO-736
>                 URL: https://issues.apache.org/jira/browse/TAJO-736
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: documentation
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.8-incubating, 1.0-incubating
>         Attachments: TAJO-736.patch
> Jinho and I wrote some user documentations for file formats. This patch contains documentations
for CSV file, RCFile, and Parquet file.

This message was sent by Atlassian JIRA

View raw message