hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steve Yeom (JIRA)" <>
Subject [jira] [Commented] (HIVE-19416) Create single version transactional table metastore statistics for aggregation queries
Date Thu, 31 May 2018 16:17:00 GMT


Steve Yeom commented on HIVE-19416:

The current single version stats has:
1. Definitions and Categories
  - Valid transactional stats: 
    I.e., a conjunction of the three:
    ~ a committed transaction created the stats 
    ~ COLUMN_STATE_ACCURATE(CSA) state is true
    ~ Isolation-level (snapshot) compliant
  - Two kinds of stats: table and column 
  - COLUMN_STATS_ACCURATE(CSA) states for a table/partition: true or false.
     one for table, one per each column
  - Categories of clients: 
    ~ Stats reader: 
      ^ StatsOptimizer for aggregation query: transactional stats reader
      ^ The rest that uses stats for cost computation inputs: non-transactional stats reader
    ~ Stats updater: transactional stats updater

2. Transactional Stats Operations
  2.1 Stats Update
    Update the single version stats, both table and column and save a table snapshot to UPD_TXNS.
    - A client requests an update with stats and a table snapshot [1].
    - creates a TBLS/PARTITIONS row adding a row into UPD_TXNS row with table write snapshot.
      ~ Updates "table stats" by updading TABLE_PARAMS/PARTITION_PARAMS
    - Updates "column stats" by updating TAB_COL_STATS/PART_COL_STATS
    - commit/abort
      ~ abortTcn() deletes the UPD_TXN row for the transaction.

    Note: now stats reader determines the state of the transactional stats' updater transaction
      by checking TXNS for open state, and checking existence of a row in UPD_TXNS for committed/aborted.

  2.2 Stats Read
    StatsOptimizer determines validity of the MetaStore transactional stats 
    to use stats for an aggregation query.
    2.2.1 Table stats
      The reader gets a TBLS/PARTITIONS row that includes table stats. 
      Then check the validity of the table stats.
      - A client comes in with its request that includes the client's table snapshot. 
      - Reads a row from TBLS/PARTITIONS.
      - Check if the CSA for table stats is true. If not, return after setting CSA.
      - Check if stats' update transaction is committed: check if a row exists from UPD_TXNS

        for the TXN_ID from TBLS/PARTITIONS. If not, invalid.
      - compare the current stats' table snapshot with the client's table snapshot  
      - if the table snapshots are  equal in commits, 
        table stats are valid.
    2.2.2 Column stats
      The reader gets a row from TAB_COL_STATS/PART_COL_STATS.
      The same steps as table stats.

3. Current/Possible invariants
  3.1 Current
    - Metastore TBLS/PARTITIONS keeps CSA updated for committed stats for both table and columns.
  3.2 Possible 
    - Metastore keeps one committed stats for both table and columns.

[1]: transaction id and a valid writeId list for the table.

> Create single version transactional table metastore statistics for aggregation queries
> --------------------------------------------------------------------------------------
>                 Key: HIVE-19416
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>          Components: Transactions
>            Reporter: Steve Yeom
>            Assignee: Steve Yeom
>            Priority: Major
> The system should use only statistics for aggregation queries like count on transactional

This message was sent by Atlassian JIRA

View raw message