tajo-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (TAJO-736) Add table management documentation
Date Fri, 04 Apr 2014 15:40:26 GMT

    [ https://issues.apache.org/jira/browse/TAJO-736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13960061#comment-13960061
] 

David Chen commented on TAJO-736:
---------------------------------

Not a problem! The documentation looks great so far. I have some minor comments:

{quote}
For more details, please refer Parquet File Format.
{quote}
Should be "please refer to the Parquet File Format."

{quote}
If you are not familiar with CREATE TABLE statement, please refer Data Definition Language
Data Definition Language.
{quote}
Similar to above, add a "to" after "refer." Also, "Data Definition Language" appears to be
repeated.

{quote}
In order to specify a certain file format for your table, you need to use USING PARQUET clause
{quote}

Add a "the" after "use."

{quote}
The below is an example statement for creating a table using PARQUET files. WITH clause allows
users to specify a set of physical properties.
{quote}

I'm not sure whether PARQUET needs to be all-caps here. Also, add a "the" before "WITH."

{quote}
Some table file formats provide special enable/disable features and the ways to adjust some
physical parameters. WITH clause in CREATE TABLE statement allows users to set those physical
parameters.
{quote}

I think it might be better to rephrase this as: "Some table storage formats provide parameters
for enabling or disabling features and adjusting physical parameters. The WITH clause in the
CREATE TABLE statement allows users to set those parameters."

{quote}
Now, Parquet file provides the following physical properties.
{quote}

"The Parquet storage format provides the following physical properties:"

{quote}
Larger values will improve the IO when reading but consume more memory when writing.
{quote}

IO should be I/O.

{quote}
The compression algorithm used to compress pages. It should be one of UNCOMPRESSED, SNAPPY,
GZIP, LZO. Default is UNCOMPRESSED.
{quote}

I believe the convention for the compression codec names from the Parquet documentation is
to use all lowercase (even though code-wise, ALL CAPS still works anyway :)).

{quote}
Compatibility Issues
{quote}

It might be possible that users might try to use Parquet files with nested schemas and non-scalar
types, which is currently a compatibility issue. Should we add a note that we are currently
working on adding support for nested schemas on non-scalar types?

--------

If you would like, I would be glad to review the documentation for the other storage formats
too. Can you create a RB review for this patch? It might be easier to review on RB.

Thanks,
David

> Add table management documentation
> ----------------------------------
>
>                 Key: TAJO-736
>                 URL: https://issues.apache.org/jira/browse/TAJO-736
>             Project: Tajo
>          Issue Type: Sub-task
>          Components: documentation
>            Reporter: Hyunsik Choi
>            Assignee: Hyunsik Choi
>             Fix For: 0.8-incubating, 1.0-incubating
>
>         Attachments: TAJO-736.patch
>
>
> Jinho and I wrote some user documentations for file formats. This patch contains documentations
for CSV file, RCFile, and Parquet file.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message