hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dyozie <...@git.apache.org>
Subject [GitHub] incubator-hawq-docs pull request #94: HAWQ-1304 - multiple doc changes for P...
Date Fri, 03 Feb 2017 17:51:17 GMT
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/94#discussion_r99384586
  
    --- Diff: markdown/pxf/PXFExternalTableandAPIReference.html.md.erb ---
    @@ -232,23 +250,23 @@ public class InputData {
     
     ### <a id="fragmenter"></a>Fragmenter
     
    -**Note:** The Fragmenter Plugin reads data into HAWQ readable external tables. The Fragmenter
Plugin cannot write data out of HAWQ into writable external tables.
    +**Note:** The Fragmenter class reads data into HAWQ readable external tables. The Fragmenter
class cannot write data out of HAWQ into writable external tables.
     
    -The Fragmenter is responsible for passing datasource metadata back to HAWQ. It also returns
a list of data fragments to the Accessor or Resolver. Each data fragment describes some part
of the requested data set. It contains the datasource name, such as the file or table name,
including the hostname where it is located. For example, if the source is a HDFS file, the
Fragmenter returns a list of data fragments containing a HDFS file block. Each fragment includes
the location of the block. If the source data is an HBase table, the Fragmenter returns information
about table regions, including their locations.
    +The Fragmenter is responsible for passing datasource metadata back to HAWQ. It also returns
a list of data fragments to the Accessor or Resolver. Each data fragment describes some part
of the requested data set. It contains the datasource name, such as the file or table name,
including the hostname where it is located. For example, if the source is an HDFS file, the
Fragmenter returns a list of data fragments containing an HDFS file block. Each fragment
includes the location of the block. If the source data is an HBase table, the Fragmenter returns
information about table regions, including their locations.
     
     The `ANALYZE` command now retrieves advanced statistics for PXF readable tables by estimating
the number of tuples in a table, creating a sample table from the external table, and running
advanced statistics queries on the sample table in the same way statistics are collected for
native HAWQ tables.
     
     The configuration parameter `pxf_enable_stat_collection` controls collection of advanced
statistics. If `pxf_enable_stat_collection` is set to false, no analysis is performed on PXF
tables. An additional parameter, `pxf_stat_max_fragments`, controls the number of fragments
sampled to build a sample table. By default `pxf_stat_max_fragments` is set to 100, which
means that even if there are more than 100 fragments, only this number of fragments will be
used in `ANALYZE` to sample the data. Increasing this number will result in better sampling,
but can also impact performance.
     
    -When a PXF table is analyzed and `pxf_enable_stat_collection` is set to off, or an error
occurs because the table is not defined correctly, the PXF service is down, or `getFragmentsStats`
is not implemented, a warning message is shown and no statistics are gathered for that table.
If `ANALYZE` is running over all tables in the database, the next table will be processed
– a failure processing one table does not stop the command.
    +When a PXF table is analyzed and `pxf_enable_stat_collection` is set to off, or an error
occurs because the table is not defined correctly, the PXF service is down, or `getFragmentsStats()`
is not implemented, a warning message is shown and no statistics are gathered for that table.
If `ANALYZE` is running over all tables in the database, the next table will be processed
– a failure processing one table does not stop the command.
    --- End diff --
    
    This sentence really needs to be unpacked. My best take at it is:
    
    When a PXF table is analyzed, any of the following conditions might result in a warning
message with no statistics gathered for the table:
    - `pxf_enable_stat_collection` is set to off, or 
    - an error occurs because the table is not defined correctly, or
    - the PXF service is down, or 
    - `getFragmentsStats()` is not implemented


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message