hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Vary <>
Subject Re: Review Request 56995: HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method
Date Fri, 24 Feb 2017 15:34:34 GMT

This is an automatically generated e-mail. To reply, visit:

Thanks for the patch Vihang!

I had alternate version in my head, but your one fared better on my quick performance test
(see below).

Another thing - as I working on HIVE-15051 to run Yetus as a precommit check I happily reporting
nits found by checkstyle/findbugs/rat etc.
Here is the list:
- Checkstyle:
./ql/src/java/org/apache/hadoop/hive/ql/metadata/ java.util.Collections;:8:
warning: Unused import - java.util.Collections.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/ java.util.concurrent.ConcurrentHashMap;:8:
warning: Unused import - java.util.concurrent.ConcurrentHashMap.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/ java.util.concurrent.ExecutorService;:8:
warning: Unused import - java.util.concurrent.ExecutorService.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/ java.util.concurrent.LinkedBlockingQueue;:8:
warning: Unused import - java.util.concurrent.LinkedBlockingQueue.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/ java.util.concurrent.TimeUnit;:8:
warning: Unused import - java.util.concurrent.TimeUnit.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/ java.util.concurrent.atomic.AtomicInteger;:8:
warning: Unused import - java.util.concurrent.atomic.AtomicInteger.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/      checkPartitionDirsSingleThreaded(basePaths,
allDirs, basePath.getFileSystem(conf), maxDepth, maxDepth);: warning: Line is longer than
100 characters (found 109).
./ql/src/java/org/apache/hadoop/hive/ql/metadata/        return
processPathDepthInfo(pd);: warning: method def child at indentation level 8 not at correct
indentation, 6
./ql/src/java/org/apache/hadoop/hive/ql/metadata/        throws
FileNotFoundException, IOException, HiveException, InterruptedException {:16: warning: Redundant
throws: 'FileNotFoundException' is subclass of 'IOException'.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/    final Path
p;:16: warning: Variable 'p' must be private and have accessor methods.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/    final int
depth;:15: warning: Variable 'depth' must be private and have accessor methods.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/          futures.add(pool.submit(new
PathDepthInfoCallable(nextLevel.poll(), maxDepth, fs, tempQueue)));: warning: Line is longer
than 100 characters (found 105).
./ql/src/java/org/apache/hadoop/hive/ql/metadata/  private void
checkPartitionDirsSingleThreaded(Queue<Path> basePaths,final Set<Path> allDirs,:71:
warning: ',' is not followed by whitespace.
- Findbugs:
Performance Warning: Should org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfo
be a _static_ inner class?
"This class is an inner class, but does not use its embedded reference
  to the object which created it.&nbsp; This reference makes the instances
  of the class larger, and may keep the reference to the creator object
  alive longer than necessary.&nbsp; If possible, the class should be
   made static."

Thanks for the patch,

ql/src/java/org/apache/hadoop/hive/ql/metadata/ (lines 526 - 532)

    Interesting, and fast solution.
    My first reaction was: Why not use a a Queue to collect the futures, and simply wait until
the every future is resolved, and the futures queue is empty? The child future would be added
to the queue before the parent finishes, so we have solution for the synchronization that
    Something like this:
    ConcurrentLinkedQueue<Future<Path>> futures = new ConcurrentLinkedQueue<Future<Path>>();
    futures.add(pool.submit(new PathDepthInfoCallable(new PathDepthInfo(basePath, 0), maxDepth,
fs, futures, pool)));
    while(!futures.isEmpty()) {
        Path p = futures.poll().get();
        if (p != null) {
    But I have tried, and your version is performed better on local fs with 10k files, 4 depth
(year/month/day/job). 2800 ms vs. 3000 ms
    I still think that my version is more readable, but speed is speed :D

- Peter Vary

On Feb. 23, 2017, 8:15 p.m., Vihang Karajgaonkar wrote:
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> -----------------------------------------------------------
> (Updated Feb. 23, 2017, 8:15 p.m.)
> Review request for hive, Aihua Xu, Ashutosh Chauhan, and Sergio Pena.
> Bugs: HIVE-15879
> Repository: hive-git
> Description
> -------
> HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method
> Diffs
> -----
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/ 7c94c95f00492467ba27dedc9ce513e13c85ea61

>   ql/src/test/org/apache/hadoop/hive/ql/metadata/ 35f52cd522e0e48a333e30966245bec65cc2ec9c

> Diff:
> Testing
> -------
> Tested using existing and newly added test cases
> Thanks,
> Vihang Karajgaonkar

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message