hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vihang Karajgaonkar <vih...@cloudera.com>
Subject Re: Review Request 56995: HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method
Date Mon, 27 Feb 2017 18:48:56 GMT


> On Feb. 27, 2017, 6:01 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java, line 452
> > <https://reviews.apache.org/r/56995/diff/2/?file=1647474#file1647474line452>
> >
> >     Can't the variable declartion be `Queue`, same for the other declarations below.

We can declare it as a Queue, but the implementation of the queue needs to be thread-safe
since multiple threads are going to operate on the queue at the same time. I thought of declaring
it as a concurrentQueue would make it more clear and understandable without any performance
implications. Is there a particular advantage you can think of declaring it as queue? This
also makes sure that we have a compile time type check to ensure that calling methods are
using concurrent queues.


> On Feb. 27, 2017, 6:01 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java, line 576
> > <https://reviews.apache.org/r/56995/diff/2/?file=1647474#file1647474line576>
> >
> >     Usually better to `throw new HiveException(e)` so that the full stack-trace
is included.

Agreed, fixed.


> On Feb. 27, 2017, 6:01 p.m., Sahil Takiar wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java, lines
550-556
> > <https://reviews.apache.org/r/56995/diff/2/?file=1647474#file1647474line550>
> >
> >     Not sure I follow this logic, why are two queues necessary?

Consider you have just one queue called nextLevel. In such a scenario the worker threads are
removing elements and adding to the same queue. This is the classic producer/consumer model
operating on a single queue. Initially I tried using a single blocking queue, but the terminating
condition is non-trivial to implement because the worker threads may or may not produce more
items for the queue. In such a case in order to avoid race conditions at while(!nextLevel.isEmpty())
check we will either need some kind of marker element added to indicate that worker thread
is done, or use wait()/notify() constructs to indicate when we have reached the terminating
condition. That implementation was getting too complex if we wanted to avoid all race conditions
possible. Given that this code path will be commonly used for msck repair command, I thought
of preferring the safer yet performant way at the small cost of allocating a new queue for
each level of the directory tree. Hope that explains.


- Vihang


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56995/#review166898
-----------------------------------------------------------


On Feb. 27, 2017, 6:44 p.m., Vihang Karajgaonkar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56995/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2017, 6:44 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Ashutosh Chauhan, and Sergio Pena.
> 
> 
> Bugs: HIVE-15879
>     https://issues.apache.org/jira/browse/HIVE-15879
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java 7c94c95f00492467ba27dedc9ce513e13c85ea61

>   ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java 35f52cd522e0e48a333e30966245bec65cc2ec9c

> 
> Diff: https://reviews.apache.org/r/56995/diff/
> 
> 
> Testing
> -------
> 
> Tested using existing and newly added test cases
> 
> 
> Thanks,
> 
> Vihang Karajgaonkar
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message