hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hornn <...@git.apache.org>
Subject [GitHub] incubator-hawq pull request: Analyze HAWQ-44
Date Tue, 10 Nov 2015 18:55:44 GMT
GitHub user hornn opened a pull request:


    Analyze HAWQ-44

    Advanced statistics for PXF table.
    PXF sample rows are collected into a temporary table, where statistics are derived of
them in the same way ANALYZE works for hawq tables.
    Statistics are gathered at 3 stages:
    1. Getting general statistics - number of fragments, size of data source, size of first
    2. Count of first fragment tuples
    HAWQ uses these numbers to determine how many tuples are needed, and these parameters
are translated to sampling ratio and number of sampled fragments.
    3. Sampling the PXF table based on the sampling ratio and number of fragments to be sampled.
The returned tuples are saved in a temporary table.
    On the PXF side, a function has been made to the Fragmenter API, to allow gathering the
stats of the first stage. In addition, a mechanism to sample rows on the fly was added to
the Bridge.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hornn/incubator-hawq analyze_HAWQ-44

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #92
commit 4237b106c6b3787b7770162fec9e23cd53e20d5e
Author: Noa Horn <nhorn@pivotal.io>
Date:   2015-10-12T19:20:01Z

    HAWQ-44. PXF Advanced Statistics: hawq side changes

commit 4eeb40f2a5aaded9e241b19580b7f42875853f4b
Author: Noa Horn <nhorn@pivotal.io>
Date:   2015-10-18T10:34:13Z

    HAWQ-44. PXF Advanced Statistics: java side

commit 7c9c64584c7dd4d8b9e4525ef1fa347805b94699
Author: Noa Horn <nhorn@pivotal.io>
Date:   2015-10-25T09:34:19Z

    HAWQ-44. documentation

commit ca7ebb118047147c95d4c998eb7650a65bc73045
Author: Noa Horn <nhorn@pivotal.io>
Date:   2015-11-02T15:14:19Z

    HAWQ-44. Add function to Fragmenter API to retrieve fragments stats, with default implementation.
    Add specific implementation to HdfsDataFragmenter.
    Add code in HAWQ to call new API, and clean up code that called analyzer.

commit 38ab2e601908f755859fb65329aaf4e3be26ca8c
Author: Noa Horn <nhorn@pivotal.io>
Date:   2015-11-04T21:02:21Z

    HAWQ-44. Update package name in new files

commit 8c7955c82a9d5c2e4a2db9f240d271c46ecb9bf9
Author: Noa Horn <nhorn@pivotal.io>
Date:   2015-11-06T19:03:09Z

    HAWQ-44. Disable getFragmentsStats for HBase and Hive fragmenters

commit 4a8183a707c21cf242d454a280570c45bcd2d880
Author: Noa Horn <nhorn@pivotal.io>
Date:   2015-11-10T02:29:07Z

    HAWQ-44. Change stats to include unit together with size of resource to avoid overflow.

commit 49b3e448436c32561af2fc16749ec484216770d6
Author: Noa Horn <nhorn@pivotal.io>
Date:   2015-11-10T18:48:42Z

    HAWQ-44. Remove extra lines


If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message