hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kevin Wilfong" <kevinwilf...@fb.com>
Subject Review Request: Add timestamp column with index to the partition stats table.
Date Tue, 27 Sep 2011 23:58:42 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2079/
-----------------------------------------------------------

Review request for hive, Yongqiang He and Ning Zhang.


Summary
-------

I added a timestamp column ts to the partition statistics table which defaults to the current_timestamp.
 I also added code to create an index on that column, and verify that index exists when we
check if the table exists.

I also took the opportunity to fix another problem.  Every time we change the schema of the
partition statistics table we give it a slightly different name, like PARTITION_STATS, PARITION_STATISTICS,
PARTITION_STAT_TBL, etc.  Instead, I want to put a number at the end of the table name, here
I have PARTITION_STATS_V2, instead of trying to come up on a new variation of name, we can
just increment the final number, this will also make it easy to identify old tables which
can be dropped.

Checking whether the index exists may not be worth the time it takes.  We have to check this
every time we init JDBCStatsPublisher, unless the table doesn't exist, and if it doesn't exist,
it's not the end of the world, it just means any scripts which try to use the index will be
slower, and the index can always be added later.  Also, the chance the program creates the
table, but is interrupted before it can create the index is low.  I added the check because
I thought the chance of having to try and find the reason why Hive slowed down, and having
to find that a clean up script is running slow, and hence holding the locks for a long time,
sounded painful, and hence the check would be worth it, but I am open to debate.


This addresses bug HIVE-2471.
    https://issues.apache.org/jira/browse/HIVE-2471


Diffs
-----

  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsPublisher.java 1175957 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsSetupConstants.java 1175957

  trunk/ql/src/java/org/apache/hadoop/hive/ql/stats/jdbc/JDBCStatsUtils.java 1175957 

Diff: https://reviews.apache.org/r/2079/diff


Testing
-------

I ran TestStatsPublisherEnhanced using both derby and MySQL, and verified all the tests succeeded.

I also ran a few queries and verified that the table and index were created and that the rows,
including timestamp, appeared in the table.


Thanks,

Kevin


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message