hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Peter Vary <pv...@cloudera.com>
Subject Re: Review Request 56995: HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method
Date Fri, 24 Feb 2017 15:34:34 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56995/#review166714
-----------------------------------------------------------



Thanks for the patch Vihang!

I had alternate version in my head, but your one fared better on my quick performance test
(see below).

Another thing - as I working on HIVE-15051 to run Yetus as a precommit check I happily reporting
nits found by checkstyle/findbugs/rat etc.
Here is the list:
- Checkstyle:
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:23:import java.util.Collections;:8:
warning: Unused import - java.util.Collections.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:31:import java.util.concurrent.ConcurrentHashMap;:8:
warning: Unused import - java.util.concurrent.ConcurrentHashMap.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:34:import java.util.concurrent.ExecutorService;:8:
warning: Unused import - java.util.concurrent.ExecutorService.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:37:import java.util.concurrent.LinkedBlockingQueue;:8:
warning: Unused import - java.util.concurrent.LinkedBlockingQueue.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:39:import java.util.concurrent.TimeUnit;:8:
warning: Unused import - java.util.concurrent.TimeUnit.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:40:import java.util.concurrent.atomic.AtomicInteger;:8:
warning: Unused import - java.util.concurrent.atomic.AtomicInteger.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:429:      checkPartitionDirsSingleThreaded(basePaths,
allDirs, basePath.getFileSystem(conf), maxDepth, maxDepth);: warning: Line is longer than
100 characters (found 109).
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:465:        return
processPathDepthInfo(pd);: warning: method def child at indentation level 8 not at correct
indentation, 6
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:469:        throws
FileNotFoundException, IOException, HiveException, InterruptedException {:16: warning: Redundant
throws: 'FileNotFoundException' is subclass of 'IOException'.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:511:    final Path
p;:16: warning: Variable 'p' must be private and have accessor methods.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:512:    final int
depth;:15: warning: Variable 'depth' must be private and have accessor methods.
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:537:          futures.add(pool.submit(new
PathDepthInfoCallable(nextLevel.poll(), maxDepth, fs, tempQueue)));: warning: Line is longer
than 100 characters (found 105).
./ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java:559:  private void
checkPartitionDirsSingleThreaded(Queue<Path> basePaths,final Set<Path> allDirs,:71:
warning: ',' is not followed by whitespace.
- Findbugs:
Performance Warning: Should org.apache.hadoop.hive.ql.metadata.HiveMetaStoreChecker$PathDepthInfo
be a _static_ inner class?
"This class is an inner class, but does not use its embedded reference
  to the object which created it.&nbsp; This reference makes the instances
  of the class larger, and may keep the reference to the creator object
  alive longer than necessary.&nbsp; If possible, the class should be
   made static."

Thanks for the patch,
Peter


ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java (lines 526 - 532)
<https://reviews.apache.org/r/56995/#comment238732>

    Interesting, and fast solution.
    
    My first reaction was: Why not use a a Queue to collect the futures, and simply wait until
the every future is resolved, and the futures queue is empty? The child future would be added
to the queue before the parent finishes, so we have solution for the synchronization that
way.
    
    Something like this:
    ConcurrentLinkedQueue<Future<Path>> futures = new ConcurrentLinkedQueue<Future<Path>>();
    futures.add(pool.submit(new PathDepthInfoCallable(new PathDepthInfo(basePath, 0), maxDepth,
fs, futures, pool)));
    while(!futures.isEmpty()) {
        Path p = futures.poll().get();
        if (p != null) {
            result.add(p);
        }
    }
    
    But I have tried, and your version is performed better on local fs with 10k files, 4 depth
(year/month/day/job). 2800 ms vs. 3000 ms
    
    I still think that my version is more readable, but speed is speed :D


- Peter Vary


On Feb. 23, 2017, 8:15 p.m., Vihang Karajgaonkar wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56995/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2017, 8:15 p.m.)
> 
> 
> Review request for hive, Aihua Xu, Ashutosh Chauhan, and Sergio Pena.
> 
> 
> Bugs: HIVE-15879
>     https://issues.apache.org/jira/browse/HIVE-15879
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-15879 : Fix HiveMetaStoreChecker.checkPartitionDirs method
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/HiveMetaStoreChecker.java 7c94c95f00492467ba27dedc9ce513e13c85ea61

>   ql/src/test/org/apache/hadoop/hive/ql/metadata/TestHiveMetaStoreChecker.java 35f52cd522e0e48a333e30966245bec65cc2ec9c

> 
> Diff: https://reviews.apache.org/r/56995/diff/
> 
> 
> Testing
> -------
> 
> Tested using existing and newly added test cases
> 
> 
> Thanks,
> 
> Vihang Karajgaonkar
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message