hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dyozie <...@git.apache.org>
Subject [GitHub] incubator-hawq-docs pull request #33: HAWQ-1107 - enhance PXF HDFS plugin do...
Date Tue, 25 Oct 2016 21:20:04 GMT
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/33#discussion_r84999127
  
    --- Diff: pxf/HDFSFileDataPXF.html.md.erb ---
    @@ -2,388 +2,282 @@
     title: Accessing HDFS File Data
     ---
     
    -## <a id="installingthepxfhdfsplugin"></a>Prerequisites
    +HDFS is the primary distributed storage mechanism used by Apache Hadoop applications.
The PXF HDFS plug-in reads file data stored in HDFS.  The plug-in supports plain delimited
and comma-separated-value format text files.  The HDFS plug-in also supports the Avro binary
format.
     
    -Before working with HDFS file data using HAWQ and PXF, you should perform the following
operations:
    +This section describes how to use PXF to access HDFS data, including how to create and
query an external table from files in the HDFS data store.
     
    --   Test PXF on HDFS before connecting to Hive or HBase.
    --   Ensure that all HDFS users have read permissions to HDFS services and that write
permissions have been limited to specific users.
    +## <a id="hdfsplugin_prereq"></a>Prerequisites
     
    -## <a id="syntax1"></a>Syntax
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
     
    -The syntax for creating an external HDFS file is as follows: 
    +-   The HDFS plug-in is installed on all cluster nodes.
    +-   All HDFS users have read permissions to HDFS services and that write permissions
have been restricted to specific users.
     
    -``` sql
    -CREATE [READABLE|WRITABLE] EXTERNAL TABLE table_name 
    -    ( column_name data_type [, ...] | LIKE other_table )
    -LOCATION ('pxf://host[:port]/path-to-data?<pxf parameters>[&custom-option=value...]')
    -      FORMAT '[TEXT | CSV | CUSTOM]' (<formatting_properties>);
    -```
    +## <a id="hdfsplugin_fileformats"></a>HDFS File Formats
     
    -where `<pxf parameters>` is:
    +The PXF HDFS plug-in supports reading the following file formats:
     
    -``` pre
    -   FRAGMENTER=fragmenter_class&ACCESSOR=accessor_class&RESOLVER=resolver_class]
    - | PROFILE=profile-name
    -```
    +- Text File - comma-separated value (.csv) or delimited format plain text file
    +- Avro - JSON-defined, schema-based data serialization format
     
    -**Note:** Omit the `FRAGMENTER` parameter for `READABLE` external tables.
    +The PXF HDFS plug-in includes the following profiles to support the file formats listed
above:
     
    -Use an SQL `SELECT` statement to read from an HDFS READABLE table:
    +- `HdfsTextSimple` - text files
    +- `HdfsTextMulti` - text files with embedded line feeds
    +- `Avro` - Avro files
     
    -``` sql
    -SELECT ... FROM table_name;
    -```
     
    -Use an SQL `INSERT` statement to add data to an HDFS WRITABLE table:
    +## <a id="hdfsplugin_cmdline"></a>HDFS Shell Commands
    +Hadoop includes command-line tools that interact directly with HDFS.  These tools support
typical file system operations including copying and listing files, changing file permissions,
etc. 
     
    -``` sql
    -INSERT INTO table_name ...;
    -```
    +The HDFS file system command is `hdfs dfs <options> [<file>]`. Invoked with
no options, `hdfs dfs` lists the file system options supported by the tool.
    +
    +`hdfs dfs` options used in this section are identified in the table below:
    +
    +| Option  | Description |
    +|-------|-------------------------------------|
    +| `-cat`    | Display file contents. |
    +| `-mkdir`    | Create directory in HDFS. |
    +| `-put`    | Copy file from local file system to HDFS. |
    +
    +### <a id="hdfsplugin_cmdline_create"></a>Create Data Files
    +
    +Perform the following steps to create data files used in subsequent exercises:
    --- End diff --
    
    I think this procedure needs a bit more explanation about what its trying to accomplish.
It seems like this should be optional in the context of the larger topic, as readers might
already have files in HDFS that they want to reference.  Just add some notes to say that you
can optionally follow the steps to create some sample files in HDFS for use in later examples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message