hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dyozie <...@git.apache.org>
Subject [GitHub] incubator-hawq-docs pull request #45: Revise section on work_mem
Date Mon, 31 Oct 2016 16:38:59 GMT
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/45#discussion_r85779030
  
    --- Diff: bestpractices/querying_data_bestpractices.html.md.erb ---
    @@ -16,14 +16,14 @@ If a query performs poorly, examine its query plan and ask the following
questio
         If the plan is not choosing the optimal join order, set `join_collapse_limit=1` and
use explicit `JOIN` syntax in your SQL statement to force the legacy query optimizer (planner)
to the specified join order. You can also collect more statistics on the relevant join columns.
     
     -   **Does the optimizer selectively scan partitioned tables?** If you use table partitioning,
is the optimizer selectively scanning only the child tables required to satisfy the query
predicates? Scans of the parent tables should return 0 rows since the parent tables do not
contain any data. See [Verifying Your Partition Strategy](../ddl/ddl-partition.html#topic74)
for an example of a query plan that shows a selective partition scan.
    --   **Does the optimizer choose hash aggregate and hash join operations where applicable?**
Hash operations are typically much faster than other types of joins or aggregations. Row comparison
and sorting is done in memory rather than reading/writing from disk. To enable the query optimizer
to choose hash operations, there must be sufficient memory available to hold the estimated
number of rows. Try increasing work memory to improve performance for a query. If possible,
run an `EXPLAIN  ANALYZE` for the query to show which plan operations spilled to disk, how
much work memory they used, and how much memory was required to avoid spilling to disk. For
example:
    +-   **Does the optimizer choose hash aggregate and hash join operations where applicable?**
Hash operations are typically much faster than other types of joins or aggregations. Row comparison
and sorting is done in memory rather than reading/writing from disk. To enable the query optimizer
to choose hash operations, there must be sufficient memory available to hold the estimated
number of rows. You may wish to  run an `EXPLAIN  ANALYZE` for the query to show which plan
operations spilled to disk, how much work memory they used, and how much memory was required
to avoid spilling to disk. For example:
    --- End diff --
    
    Let's change "You may wish to  run" to just "Run" here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message