hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vihang Karajgaonkar <vih...@cloudera.com>
Subject Re: Review Request 56995: HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method
Date Mon, 27 Feb 2017 18:44:14 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56995/#review166900
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java (line 443)
<https://reviews.apache.org/r/56995/#comment238989>

    We can declare it as a Queue, but the implementation of the queue needs to be thread-safe
since multiple threads are going to operate on the queue at the same time. I thought of declaring
it as a concurrentQueue would make it more clear and understandable without any performance
implications. Is there a particular advantage you can think of declaring it as queue? This
also makes sure that we have a compile time type check to ensure that calling methods are
using concurrent queues.



ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java (lines 520 - 526)
<https://reviews.apache.org/r/56995/#comment238995>

    Consider you have just one queue called nextLevel. In such a scenario the worker threads
are removing elements and adding to the same queue. This is the classic producer/consumer
model operating on a single queue. Initially I tried using a single blocking queue, but the
terminating condition is non-trivial to implement because the worker threads may or may not
produce more items for the queue. In such a case in order to avoid race conditions at while(!nextLevel.isEmpty())
check we will either need some kind of marker element added to indicate that worker thread
is done, or use wait()/notify() constructs to indicate when we have reached the terminating
condition. That implementation was getting too complex if we wanted to avoid all race conditions
possible. Given that this code path will be commonly used for msck repair command, I thought
of preferring the safer yet performant way at the small cost of allocating a new queue for
each level of the directory tree. Hope that explains.



ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java (line 546)
<https://reviews.apache.org/r/56995/#comment238990>

    I just replicated what the previous code was throwing. Agreed, better to throw new HiveException(e).
Changed.


- Vihang Karajgaonkar


On Feb. 24, 2017, 7:20 p.m., Vihang Karajgaonkar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56995/
> -----------------------------------------------------------
> 
> (Updated Feb. 24, 2017, 7:20 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Ashutosh Chauhan, and Sergio Pena.
> 
> 
> Bugs: HIVE-15879
>     https://issues.apache.org/jira/browse/HIVE-15879
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java 7c94c95f00492467ba27dedc9ce513e13c85ea61

>   ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java 35f52cd522e0e48a333e30966245bec65cc2ec9c

> 
> Diff: https://reviews.apache.org/r/56995/diff/
> 
> 
> Testing
> -------
> 
> Tested using existing and newly added test cases
> 
> 
> Thanks,
> 
> Vihang Karajgaonkar
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message