Return-Path: X-Original-To: apmail-phoenix-commits-archive@minotaur.apache.org Delivered-To: apmail-phoenix-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A509618F9A for ; Fri, 11 Mar 2016 01:34:48 +0000 (UTC) Received: (qmail 45645 invoked by uid 500); 11 Mar 2016 01:34:48 -0000 Delivered-To: apmail-phoenix-commits-archive@phoenix.apache.org Received: (qmail 45610 invoked by uid 500); 11 Mar 2016 01:34:48 -0000 Mailing-List: contact commits-help@phoenix.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@phoenix.apache.org Delivered-To: mailing list commits@phoenix.apache.org Received: (qmail 45601 invoked by uid 99); 11 Mar 2016 01:34:48 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 11 Mar 2016 01:34:48 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 037AD1A1212 for ; Fri, 11 Mar 2016 01:34:48 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.471 X-Spam-Level: * X-Spam-Status: No, score=1.471 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-0.329] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 8yCm4rDAP093 for ; Fri, 11 Mar 2016 01:34:34 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTP id 844385F3FE for ; Fri, 11 Mar 2016 01:34:33 +0000 (UTC) Received: from svn01-us-west.apache.org (svn.apache.org [10.41.0.6]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A303AE0E39 for ; Fri, 11 Mar 2016 01:34:31 +0000 (UTC) Received: from svn01-us-west.apache.org (localhost [127.0.0.1]) by svn01-us-west.apache.org (ASF Mail Server at svn01-us-west.apache.org) with ESMTP id 8F2C63A0EE4 for ; Fri, 11 Mar 2016 01:34:31 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r1734488 [5/6] - in /phoenix/site: publish/ publish/language/ source/src/site/ source/src/site/markdown/ Date: Fri, 11 Mar 2016 01:34:30 -0000 To: commits@phoenix.apache.org From: jamestaylor@apache.org X-Mailer: svnmailer-1.0.9 Message-Id: <20160311013431.8F2C63A0EE4@svn01-us-west.apache.org> Modified: phoenix/site/publish/source.html URL: http://svn.apache.org/viewvc/phoenix/site/publish/source.html?rev=1734488&r1=1734487&r2=1734488&view=diff ============================================================================== --- phoenix/site/publish/source.html (original) +++ phoenix/site/publish/source.html Fri Mar 11 01:34:29 2016 @@ -69,9 +69,6 @@
  • How to Release
  • License
  • -
  • Sponsorship
  • -
  • Thanks
  • -
  • Security
  • @@ -121,6 +116,12 @@
  • Grammar
  • Functions
  • Datatypes
  • +
  • ARRAY type
  • +
  • +
  • Row Value Constructors
  • +
  • Sequences
  • +
  • Joins
  • +
  • Subqueries
  • @@ -235,15 +236,6 @@
  • License
  • -
  • - Sponsorship -
  • -
  • - Thanks -
  • -
  • - Security -
  • Modified: phoenix/site/publish/subqueries.html URL: http://svn.apache.org/viewvc/phoenix/site/publish/subqueries.html?rev=1734488&r1=1734487&r2=1734488&view=diff ============================================================================== --- phoenix/site/publish/subqueries.html (original) +++ phoenix/site/publish/subqueries.html Fri Mar 11 01:34:29 2016 @@ -1,7 +1,7 @@ @@ -69,9 +69,6 @@
  • How to Release
  • License
  • -
  • Sponsorship
  • -
  • Thanks
  • -
  • Security
  • @@ -121,6 +116,12 @@
  • Grammar
  • Functions
  • Datatypes
  • +
  • ARRAY type
  • +
  • +
  • Row Value Constructors
  • +
  • Sequences
  • +
  • Joins
  • +
  • Subqueries
  • @@ -373,15 +374,6 @@ ORDER BY count(*) DESC;
  • License
  • -
  • - Sponsorship -
  • -
  • - Thanks -
  • -
  • - Security -
  • Modified: phoenix/site/publish/team.html URL: http://svn.apache.org/viewvc/phoenix/site/publish/team.html?rev=1734488&r1=1734487&r2=1734488&view=diff ============================================================================== --- phoenix/site/publish/team.html (original) +++ phoenix/site/publish/team.html Fri Mar 11 01:34:29 2016 @@ -1,7 +1,7 @@ @@ -69,9 +69,6 @@
  • How to Release
  • License
  • -
  • Sponsorship
  • -
  • Thanks
  • -
  • Security
  • @@ -121,6 +116,12 @@
  • Grammar
  • Functions
  • Datatypes
  • +
  • ARRAY type
  • +
  • +
  • Row Value Constructors
  • +
  • Sequences
  • +
  • Joins
  • +
  • Subqueries
  • @@ -386,15 +387,6 @@
  • License
  • -
  • - Sponsorship -
  • -
  • - Thanks -
  • -
  • - Security -
  • Modified: phoenix/site/publish/tracing.html URL: http://svn.apache.org/viewvc/phoenix/site/publish/tracing.html?rev=1734488&r1=1734487&r2=1734488&view=diff ============================================================================== --- phoenix/site/publish/tracing.html (original) +++ phoenix/site/publish/tracing.html Fri Mar 11 01:34:29 2016 @@ -1,7 +1,7 @@ @@ -69,9 +69,6 @@
  • How to Release
  • License
  • -
  • Sponsorship
  • -
  • Thanks
  • -
  • Security
  • @@ -121,6 +116,12 @@
  • Grammar
  • Functions
  • Datatypes
  • +
  • ARRAY type
  • +
  • +
  • Row Value Constructors
  • +
  • Sequences
  • +
  • Joins
  • +
  • Subqueries
  • @@ -411,15 +412,6 @@ Connection conn = DriverManager.getConne
  • License
  • -
  • - Sponsorship -
  • -
  • - Thanks -
  • -
  • - Security -
  • @@ -470,34 +462,28 @@ Connection conn = DriverManager.getConne Transactions
  • - Secondary Indexes -
  • -
  • User-defined Functions
  • - Bulk Loading +
  • - Query Server -
  • -
  • - Tracing + Secondary Indexes
  • - ARRAY type + Statistics Collection
  • - Sequences + Row Timestamp Column
  • - Statistics Collection + Salted Tables
  • - Joins + Skip Scan
  • - Subqueries +
  • Views @@ -506,19 +492,19 @@ Connection conn = DriverManager.getConne Multi tenancy
  • - Paged Queries -
  • -
  • Dynamic Columns
  • - Skip Scan +
  • - Salted Tables + Bulk Loading
  • - Row Timestamp Column + Query Server +
  • +
  • + Tracing
  • Metrics @@ -537,6 +523,24 @@ Connection conn = DriverManager.getConne
  • Datatypes
  • +
  • + ARRAY type +
  • +
  • + +
  • +
  • + Row Value Constructors +
  • +
  • + Sequences +
  • +
  • + Joins +
  • +
  • + Subqueries +
  • Modified: phoenix/site/publish/transactions.html URL: http://svn.apache.org/viewvc/phoenix/site/publish/transactions.html?rev=1734488&r1=1734487&r2=1734488&view=diff ============================================================================== --- phoenix/site/publish/transactions.html (original) +++ phoenix/site/publish/transactions.html Fri Mar 11 01:34:29 2016 @@ -1,7 +1,7 @@ @@ -69,9 +69,6 @@
  • How to Release
  • License
  • -
  • Sponsorship
  • -
  • Thanks
  • -
  • Security
  • @@ -121,6 +116,12 @@
  • Grammar
  • Functions
  • Datatypes
  • +
  • ARRAY type
  • +
  • +
  • Row Value Constructors
  • +
  • Sequences
  • +
  • Joins
  • +
  • Subqueries
  • @@ -308,15 +309,6 @@ DELETE FROM my_other_table WHERE k=2;
  • License
  • -
  • - Sponsorship -
  • -
  • - Thanks -
  • -
  • - Security -
  • @@ -367,34 +359,28 @@ DELETE FROM my_other_table WHERE k=2; Transactions
  • - Secondary Indexes -
  • -
  • User-defined Functions
  • - Bulk Loading -
  • -
  • - Query Server +
  • - Tracing + Secondary Indexes
  • - ARRAY type + Statistics Collection
  • - Sequences + Row Timestamp Column
  • - Statistics Collection + Salted Tables
  • - Joins + Skip Scan
  • - Subqueries +
  • Views @@ -403,19 +389,19 @@ DELETE FROM my_other_table WHERE k=2; Multi tenancy
  • - Paged Queries + Dynamic Columns
  • - Dynamic Columns +
  • - Skip Scan + Bulk Loading
  • - Salted Tables + Query Server
  • - Row Timestamp Column + Tracing
  • Metrics @@ -434,6 +420,24 @@ DELETE FROM my_other_table WHERE k=2;
  • Datatypes
  • +
  • + ARRAY type +
  • +
  • + +
  • +
  • + Row Value Constructors +
  • +
  • + Sequences +
  • +
  • + Joins +
  • +
  • + Subqueries +
  • Modified: phoenix/site/publish/tuning.html URL: http://svn.apache.org/viewvc/phoenix/site/publish/tuning.html?rev=1734488&r1=1734487&r2=1734488&view=diff ============================================================================== --- phoenix/site/publish/tuning.html (original) +++ phoenix/site/publish/tuning.html Fri Mar 11 01:34:29 2016 @@ -1,7 +1,7 @@ @@ -69,9 +69,6 @@
  • How to Release
  • License
  • -
  • Sponsorship
  • -
  • Thanks
  • -
  • Security
  • @@ -121,6 +116,12 @@
  • Grammar
  • Functions
  • Datatypes
  • +
  • ARRAY type
  • +
  • +
  • Row Value Constructors
  • +
  • Sequences
  • +
  • Joins
  • +
  • Subqueries
  • @@ -144,7 +145,7 @@ -

    Phoenix provides many different knobs and dials to configure and tune the system to run more optimally on your cluster. The configuration is done through a series of Phoenix-specific properties specified both on client and server-side hbase-site.xml files. In addition to these properties, there are of course all the HBase configuration properties with the most important ones documented here.

    The table below outlines the full set of Phoenix-specific configuration properties and their defaults. Of these, we’ll talk in depth about some of the most important ones below.

    +

    Phoenix provides many different knobs and dials to configure and tune the system to run more optimally on your cluster. The configuration is done through a series of Phoenix-specific properties specified both on client and server-side hbase-site.xml files. In addition to these properties, there are of course all the HBase configuration properties with the most important ones documented here.

    The table below outlines the full set of Phoenix-specific configuration properties and their defaults.

    @@ -153,13 +154,23 @@ + + + + + + + + + + - + - + @@ -174,27 +185,27 @@ - + - + - + - + - + @@ -264,9 +275,29 @@ - + + + + + + + + + + + + + + + + + + + + + @@ -372,53 +403,9 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
    Default
    data.tx.snapshot.dirServer-side property specifying the HDFS directory used to store snapshots of the transaction state. No default value.None
    data.tx.timeoutServer-side property specifying the timeout in seconds for a transaction to complete. Default is 30 seconds.30
    phoenix.query.timeoutMsNumber of milliseconds after which a query will timeout on the client. Default is 10 min.Client-side property specifying the number of milliseconds after which a query will timeout on the client. Default is 10 min. 600000
    phoenix.query.keepAliveMsWhen the number of threads is greater than the core in the client side thread pool executor, this is the maximum time in milliseconds that excess idle threads will wait for a new tasks before terminating. Default is 60 sec. Maximum time in milliseconds that excess idle threads will wait for a new tasks before terminating when the number of threads is greater than the cores in the client side thread pool executor. Default is 60 sec. 60000
    phoenix.stats.guidepost.width A server-side parameter that specifies the number of bytes between guideposts. A smaller amount increases parallelization, but also increases the number of chunks which must be merged on the client side. The default value is 100 MB. Server-side parameter that specifies the number of bytes between guideposts. A smaller amount increases parallelization, but also increases the number of chunks which must be merged on the client side. The default value is 100 MB. 104857600
    phoenix.stats.guidepost.per.region A server-side parameter that specifies the number of guideposts per region. If set to a value greater than zero, then the guidepost width is determiend by MAX_FILE_SIZE of table / phoenix.stats.guidepost.per.region. Otherwise, if not set, then the phoenix.stats.guidepost.width parameter is used. No default value. Server-side parameter that specifies the number of guideposts per region. If set to a value greater than zero, then the guidepost width is determiend by MAX_FILE_SIZE of table / phoenix.stats.guidepost.per.region. Otherwise, if not set, then the phoenix.stats.guidepost.width parameter is used. No default value. None
    phoenix.stats.updateFrequency A server-side paramater that determines the frequency in milliseconds for which statistics will be refreshed from the statistics table and subsequently used by the client. The default value is 15 min. Server-side paramater that determines the frequency in milliseconds for which statistics will be refreshed from the statistics table and subsequently used by the client. The default value is 15 min. 900000
    phoenix.stats.minUpdateFrequency A client-side parameter that determines the minimum amount of time in milliseconds that must pass before statistics may again be manually collected through another UPDATE STATISTICS call. The default value is phoenix.stats.updateFrequency / 2. Client-side parameter that determines the minimum amount of time in milliseconds that must pass before statistics may again be manually collected through another UPDATE STATISTICS call. The default value is phoenix.stats.updateFrequency / 2. 450000
    phoenix.stats.useCurrentTime An advanced server-side parameter that if true causes the current time on the server-side to be used as the timestamp of rows in the statistics table when background tasks such as compactions or splits occur. If false, then the max timestamp found while traversing the table over which statistics are being collected is used as the timestamp. Unless your client is controlling the timestamps while reading and writing data, this parameter should be left alone. The default value is true. Server-side parameter that if true causes the current time on the server-side to be used as the timestamp of rows in the statistics table when background tasks such as compactions or splits occur. If false, then the max timestamp found while traversing the table over which statistics are being collected is used as the timestamp. Unless your client is controlling the timestamps while reading and writing data, this parameter should be left alone. The default value is true. true
    phoenix.query.useIndexesDetermines whether or not indexes are considered by the optimizer to satisfy a query. Default is true Client-side property determining whether or not indexes are considered by the optimizer to satisfy a query. Default is true true
    phoenix.index.failure.handling.rebuildServer-side property determining whether or not a mutable index is rebuilt in the background in the event of a commit failure. Only applicable for indexes on mutable, non transactional tables. Default is true true
    phoenix.index.failure.block.writeServer-side property determining whether or not a writes to the data table are disallowed in the event of a commit failure until the index can be caught up with the data table. Requires that phoenix.index.failure.handling.rebuild is true as well. Only applicable for indexes on mutable, non transactional tables. Default is false false
    phoenix.index.failure.handling.rebuild.intervalServer-side property controlling the millisecond frequency at which the server checks whether or not a mutable index needs to be partially rebuilt to catch up with updates to the data table. Only applicable for indexes on mutable, non transactional tables. Default is 10 seconds. 10000
    phoenix.index.failure.handling.rebuild.overlap.timeServer-side property controlling how many milliseconds to go back from the timestamp at which the failure occurred to go back when a partial rebuild is performed. Only applicable for indexes on mutable, non transactional tables. Default is 1 millisecond. 1
    phoenix.index.mutableBatchSizeThreshold Number of mutations in a batch beyond which index metadata will be sent as a separate RPC to each region server as opposed to included inline with each mutation. Defaults to 5. Determines whether or not transactions are enabled in Phoenix. A table may not be declared as transactional if transactions are disabled. Default is false. This is a client side parameter. Available starting from Phoenix 4.7. false
    data.tx.snapshot.dirThe HDFS directory used to store snapshots of the transaction state. No default value. This is a server side parameter. Available starting from Phoenix 4.7.None
    data.tx.timeoutThe timeout in seconds for a transaction to complete. Default is 30 seconds. This is a server side parameter. Available starting from Phoenix 4.7.30
    phoenix.query.targetConcurrency
    Obsolete as of 3.2/4.2
    Target concurrent threads to use for a query. It serves as a soft limit on the number of scans into which a query may be split. The value should not exceed the hard limit imposed by phoenix.query.maxConcurrency.32
    phoenix.query.maxConcurrency
    Obsolete as of 3.2/4.2
    Maximum concurrent threads to use for a query. It servers as a hard limit on the number of scans into which a query may be split. A soft limit is imposed by phoenix.query.targetConcurrency.64
    phoenix.query.maxStatsAge
    Obsolete as of 3.2/4.2
    The maximum age of stats in milliseconds after which they will no longer be used (i.e. the stats were not able to be updated in this amount of time and thus are considered too old). Default is 1 day.1
    phoenix.query.statsUpdateFrequency
    Obsolete as of 3.2/4.2
    The frequency in milliseconds at which the stats for each table will be updated. Default is 15 min.900000
    phoenix.query.maxIntraRegionParallelization
    Obsolete as of 3.2/4.2
    The maximum number of threads that will be spawned to process data within a single region during query execution64
    -
    -
    -
    -
    -

    Parallelization

    -

    Phoenix breaks up queries into multiple scans and runs them in parallel through coprocessors to improve performance. Hari Kumar, from Ericsson Labs, did a good job of explaining the performance benefits of parallelization and coprocessors here.

    -

    As of 3.2/4.2, parallelization in Phoenix is driven by the guideposts as determined by the configuration parameters for statistics collection. Each chunk of data between guideposts will be run in parallel in a separate scan to improve query performance. The chunk size is determined by the server-side phoenix.stats.guidepost.width or phoenix.stats.guidepost.per.region configuration parameters. As the size of the chunks decrease, you'll want to increase phoenix.query.queueSize as more work will be queued in that case. Note that at a minimum, separate scans will be run for each table region. Beyond the statistics collection configuration parameters, the client-side phoenix.query.threadPoolSize and phoenix.query.queueSize parameters and the server-side hbase.regionserver.handler.count parameter have an impact on performance.

    -
    -
    -
    +
    @@ -491,15 +478,6 @@
  • License
  • -
  • - Sponsorship -
  • -
  • - Thanks -
  • -
  • - Security -
  • Modified: phoenix/site/publish/udf.html URL: http://svn.apache.org/viewvc/phoenix/site/publish/udf.html?rev=1734488&r1=1734487&r2=1734488&view=diff ============================================================================== --- phoenix/site/publish/udf.html (original) +++ phoenix/site/publish/udf.html Fri Mar 11 01:34:29 2016 @@ -69,9 +69,6 @@
  • How to Release
  • License
  • -
  • Sponsorship
  • -
  • Thanks
  • -
  • Security
  • @@ -121,6 +116,12 @@
  • Grammar
  • Functions
  • Datatypes
  • +
  • ARRAY type
  • +
  • +
  • Row Value Constructors
  • +
  • Sequences
  • +
  • Joins
  • +
  • Subqueries
  • @@ -361,15 +362,6 @@ Connection conn = DriverManager.getConne
  • License
  • -
  • - Sponsorship -
  • -
  • - Thanks -
  • -
  • - Security -
  • @@ -419,35 +411,29 @@ Connection conn = DriverManager.getConne
  • Transactions
  • -
  • - Secondary Indexes -
  • User-defined Functions
  • - Bulk Loading -
  • -
  • - Query Server +
  • - Tracing + Secondary Indexes
  • - ARRAY type + Statistics Collection
  • - Sequences + Row Timestamp Column
  • - Statistics Collection + Salted Tables
  • - Joins + Skip Scan
  • - Subqueries +
  • Views @@ -456,19 +442,19 @@ Connection conn = DriverManager.getConne Multi tenancy
  • - Paged Queries + Dynamic Columns
  • - Dynamic Columns +
  • - Skip Scan + Bulk Loading
  • - Salted Tables + Query Server
  • - Row Timestamp Column + Tracing
  • Metrics @@ -487,6 +473,24 @@ Connection conn = DriverManager.getConne
  • Datatypes
  • +
  • + ARRAY type +
  • +
  • + +
  • +
  • + Row Value Constructors +
  • +
  • + Sequences +
  • +
  • + Joins +
  • +
  • + Subqueries +
  • Modified: phoenix/site/publish/update_statistics.html URL: http://svn.apache.org/viewvc/phoenix/site/publish/update_statistics.html?rev=1734488&r1=1734487&r2=1734488&view=diff ============================================================================== --- phoenix/site/publish/update_statistics.html (original) +++ phoenix/site/publish/update_statistics.html Fri Mar 11 01:34:29 2016 @@ -69,9 +69,6 @@
  • How to Release
  • License
  • -
  • Sponsorship
  • -
  • Thanks
  • -
  • Security
  • @@ -121,6 +116,12 @@
  • Grammar
  • Functions
  • Datatypes
  • +
  • ARRAY type
  • +
  • +
  • Row Value Constructors
  • +
  • Sequences
  • +
  • Joins
  • +
  • Subqueries
  • @@ -147,6 +148,10 @@

    The UPDATE STATISTICS command updates the statistics collected on a table, to improve query performance. This command collects a set of keys per region per column family that are equal byte distanced from each other. These collected keys are called guideposts and they act as hints/guides to improve the parallelization of queries on a given target region.

    Statistics are also automatically collected during major compactions and region splits so manually running this command may not be necessary.

    +

    Parallelization

    +

    Phoenix breaks up queries into multiple scans and runs them in parallel to improve performance. parallelization in Phoenix is driven by the statistics related configuration parameters. Each chunk of data between guideposts will be run in parallel in a separate scan to improve query performance. The chunk size is determined by the server-side phoenix.stats.guidepost.width or phoenix.stats.guidepost.per.region parameters. As the size of the chunks decrease, you’ll want to increase phoenix.query.queueSize as more work will be queued in that case. Note that at a minimum, separate scans will be run for each table region. Beyond the statistics collection configuration parameters, the client-side phoenix.query.threadPoolSize and phoenix.query.queueSize parameters and the server-side hbase.regionserver.handler.count parameter have an impact on performance.

    +
    +

    Examples

    For a given table my_table:

    @@ -171,7 +176,7 @@
    -

    Configurations

    +

    Configuration

    The configuration parameters controlling statistics collection include:

    1. phoenix.stats.guidepost.width @@ -273,15 +278,6 @@
    2. License
    3. -
    4. - Sponsorship -
    5. -
    6. - Thanks -
    7. -
    8. - Security -