phoenix-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (PHOENIX-4009) Run UPDATE STATISTICS command by using MR integration on snapshots
Date Thu, 20 Dec 2018 07:51:00 GMT

    [ https://issues.apache.org/jira/browse/PHOENIX-4009?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16725643#comment-16725643
] 

ASF GitHub Bot commented on PHOENIX-4009:
-----------------------------------------

Github user karanmehta93 commented on the issue:

    https://github.com/apache/phoenix/pull/419
  
    _First of all, apologies for loooong PR._ (Most of it is refactoring but still its hard
to review)
    
    **Here's the high level idea** 
    1. 7 classes were inherited from `StatsCollectorIT`, testing stats collection for different
types of table properties. There was a lot of redundancy in the test suite. Also, all the
tests were running with namespaces enabled all the time (This is because it is set once for
the JVM and we cannot go back without restarting the server). We were controlling the parameterized
property for new `PhoenixConnection`, which is disallowed according to documentation.
    The code is now refactored to have only 3 classes, 
    `NamespaceMappedStatsCollectorIT` --> namespaces enabled, collect stats via snapshots
as well as SQL statement
    `NonTxStatsCollectorIT` --> mutable/immutable tables, column encoded/non column encoded
    `TxStatsCollectorIT` --> mutable/immutable tables, column encoded/non column encoded,
TEPHRA/OMID
    
    2. The `StatsCollectorIT` is renamed to `BaseStatsCollectorIT` and tests have been improved
to cover certain scenarios. More tests coming along the way.
    
    3. Server side changes:
    `DefaultStatisticsCollector` is now an abstract class, RegionServerStatisticsCollector
and `MapperStatisticsCollector` are its children. The former is triggered for SQL statements
and the latter is used for this Jira (Map Reduce Job). Most of the common code is moved to
base class.
    
    4. The snapshot scanner has been improved to collect statistics if the scan is configured
accordingly. A `NoOpStatisticsCollector` instance is instantiated if its a regular phoenix
MR job on snapshots. 
    
    5. Also have the configuration changes in `PhoenixConfigurationUtil` class.
    
    Finally, `UpdateStatisticsTool` is the tool to launch the MR job.
    
    This is the v1 version for some initial feedback. Please comment wherever its not clear.
    
    **Coming up:** 
    More tests covering other scenarios.
    Perf testing for sample tables and the results.
    Better/useful log lines
    General code cleanup for nits


> Run UPDATE STATISTICS command by using MR integration on snapshots
> ------------------------------------------------------------------
>
>                 Key: PHOENIX-4009
>                 URL: https://issues.apache.org/jira/browse/PHOENIX-4009
>             Project: Phoenix
>          Issue Type: Bug
>            Reporter: Samarth Jain
>            Priority: Major
>
> Now that we have the capability to run queries against table snapshots through our map
reduce integration, we can utilize this capability for stats collection too. This would make
our stats collection more resilient, resource aware and less resource intensive. The bulk
of the plumbing is already in place. We would need to make sure that the integration doesn't
barf when the query is an UPDATE STATISTICS command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message