hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Kolbasov <ako...@gmail.com>
Subject Re: Review Request 65634: HIVE-18264: CachedStore: Store cached partitions/col stats within the table cache
Date Fri, 02 Mar 2018 07:54:58 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/65634/#review198507
-----------------------------------------------------------



Changes are pretty big, I didn't go through all of them 0 some comments below.


standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java
Lines 62 (patched)
<https://reviews.apache.org/r/65634/#comment278651>

    Please add Javadoc comment, explaining what this function does.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java
Lines 63 (patched)
<https://reviews.apache.org/r/65634/#comment278653>

    It would be cleaner and easier to read to rewrite this as
    
    ```
      public static String buildKey(List<String> partVals) {
        if (partVals == null || partVals.isEmpty()) {
          return "";
        }
        return String.join(delimit, partVals);
      }
    ```



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java
Lines 70 (patched)
<https://reviews.apache.org/r/65634/#comment278652>

    1) Please add Javadoc comment, explaining what this function does.
    2) Is overloading really useful here?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java
Lines 71 (patched)
<https://reviews.apache.org/r/65634/#comment278654>

    why not just 
    
    `return buildKey(partVals) + delimit + colName`
    
    can colName be empty here or not?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java
Line 62 (original), 75 (patched)
<https://reviews.apache.org/r/65634/#comment278655>

    This method is never used



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Lines 128 (patched)
<https://reviews.apache.org/r/65634/#comment278656>

    1) Please add units in the name and use constant for the default value.
    2) Please document what is `cacheRefreshPeriod`.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 223 (original), 137 (patched)
<https://reviews.apache.org/r/65634/#comment278657>

    Why do you need an empty public constructor?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 226 (original), 140 (patched)
<https://reviews.apache.org/r/65634/#comment278660>

    Please document this method - in particular how does it gets cache implementaiton from
config.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 236 (original), 150 (patched)
<https://reviews.apache.org/r/65634/#comment278659>

    Please used internal formatting for LOG:
    
    LOG.debug("CachedStore is not enabled; using {}", clazzName)



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 237 (original), 151 (patched)
<https://reviews.apache.org/r/65634/#comment278658>

    This return is not needed.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 259 (original), 172 (patched)
<https://reviews.apache.org/r/65634/#comment278661>

    Can this be an else part of the prior if?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 262 (original), 175 (patched)
<https://reviews.apache.org/r/65634/#comment278663>

    This doesn't look correct:
    
    1) initBlackListWhiteList() will not update any existing whitelist or blacklist, only
add one if it wasn't there.
    2) initBlackListWhiteList() is calling `Collections.reverse(blacklistPatterns)` which
doesn't make sense when configuration is set to a new value.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Lines 176 (patched)
<https://reviews.apache.org/r/65634/#comment278662>

    Looks like every time someone calls setConf() a new thread is started - isn't it a threda
leak?
    In general it isn't a good practice to add such side-effects for config changes like setConf
- it is better to explicitly call a method which will do whatever is needed after conf update.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Lines 179 (patched)
<https://reviews.apache.org/r/65634/#comment278664>

    Please document this method.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Lines 180 (patched)
<https://reviews.apache.org/r/65634/#comment278665>

    This seem to repeat the code from setConf. is there any way to avoid code copy?
    
    The only difference seems to be the call to startCacheUpdateService() which shows that
it isn't a good idea to have it there.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 266 (original), 202 (patched)
<https://reviews.apache.org/r/65634/#comment278669>

    Please document this method. Among other things - can prewarm() be called multiple times?
If not, should it be somehow enforced?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 270 (original), 206 (patched)
<https://reviews.apache.org/r/65634/#comment278668>

    Please use built-in {} formatting for og.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 271 (original), 207 (patched)
<https://reviews.apache.org/r/65634/#comment278667>

    Please remove <Database> and add explicit size:
    
    `List<Database> databases = new ArrayList<>(dbNames.size());`



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Lines 209 (patched)
<https://reviews.apache.org/r/65634/#comment278671>

    This line isn't adding anything, why not just
    
        for (String dbName : dbNames) {
          databases.add(rawStore.getDatabase(dbName));
        }



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Lines 213 (patched)
<https://reviews.apache.org/r/65634/#comment278674>

    Why is this an info log and not debug?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 272 (original), 215 (patched)
<https://reviews.apache.org/r/65634/#comment278675>

    Why not use something like the for loop before?
    
    `for (String dbName : dbNames) {...}`
    
    in particular, dbNames.size() doesn't change in the loop so it can be cached.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 285 (original), 217 (patched)
<https://reviews.apache.org/r/65634/#comment278678>

    It is quite possible that there is another metastore instance running and someone removes
this database, so this call will fail due to missing database. I think this code should continue
prewarm for other databases in such cases.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 286 (original), 218 (patched)
<https://reviews.apache.org/r/65634/#comment278676>

    tblNames may contain thousands of entries - it isn't a good idea to log it.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 287 (original), 219 (patched)
<https://reviews.apache.org/r/65634/#comment278677>

    Again, why not just a simple iterator over tblNames?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 290 (original), 222 (patched)
<https://reviews.apache.org/r/65634/#comment278679>

    Why not just use normal dbname.tblName notation?
    
    `"LOG.info("table {}.{} is not cached", dbName, tblName);"`
    
    Also, is there value in info logging this? There is already debug logging for shouldCacheTable()



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 299 (original), 229 (patched)
<https://reviews.apache.org/r/65634/#comment278680>

    Wouldn't getTable throw execption in this case?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 302 (original), 232 (patched)
<https://reviews.apache.org/r/65634/#comment278681>

    Why set it to null and not to the actual value (which is done a few lines below)?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 303 (original), 233 (patched)
<https://reviews.apache.org/r/65634/#comment278682>

    I think you can simplify the code and move this to where they are used if you modify if
statement below.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 317 (original), 242 (patched)
<https://reviews.apache.org/r/65634/#comment278683>

    Here it would be better to do
    
        if (!table.isSetPartitionKeys())) {
           return what should be returned in this case
        }



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 321 (original), 243 (patched)
<https://reviews.apache.org/r/65634/#comment278684>

    This may fail as well in which case you want to continue with other tables.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Lines 245 (patched)
<https://reviews.apache.org/r/65634/#comment278685>

    This can fail. Also you can get a mismatch between partition names and actual partitions.
    
    I think it is better to just get all partitions and then get their names from partition
list to avoid such inconsistency.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 322 (original), 252 (patched)
<https://reviews.apache.org/r/65634/#comment278686>

    1) please use isEmpty() instead. Also shouldn't you check partitions list rather then
partition names?
    2) It is more readable if you just return whatever is needed if this condition is false.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 324 (original), 255 (patched)
<https://reviews.apache.org/r/65634/#comment278687>

    This can fail



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 329 (original), 261 (patched)
<https://reviews.apache.org/r/65634/#comment278689>

    Replace <String> with <> and add list size (partKeys.size())



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 330 (original), 262 (patched)
<https://reviews.apache.org/r/65634/#comment278691>

    This can use Collections.nCopies(partKeys.size(), defaultPartitionValue)



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Lines 274 (patched)
<https://reviews.apache.org/r/65634/#comment278692>

    There may be thousands of tables per DB - is there a value in info logging this?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 360 (original), 296 (patched)
<https://reviews.apache.org/r/65634/#comment278693>

    1) Please reorder modifiers
    2) Please add comment describing what this method does



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 368 (original), 305 (patched)
<https://reviews.apache.org/r/65634/#comment278694>

    ANy reason not to use lambda here?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 409 (original), 353 (patched)
<https://reviews.apache.org/r/65634/#comment278695>

    this can be package-private



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 425 (original), 369 (patched)
<https://reviews.apache.org/r/65634/#comment278696>

    It owuld be easier to read if you reverse conditions:
    
          if (!shouldRunPrewarm) {
            update();
            return;
          }



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 426 (original), 370 (patched)
<https://reviews.apache.org/r/65634/#comment278697>

    What protects isCachePrewarmed()?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 430 (original), 374 (patched)
<https://reviews.apache.org/r/65634/#comment278699>

    Should the retry logic live here or in prewarm() itself?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 439 (original), 382 (patched)
<https://reviews.apache.org/r/65634/#comment278698>

    Note that this is a regular static variable which is modified here. It should be either
atomic or volatile.
    Also, this is static field and here non-static method modifies static field which is a
bad practice.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 453 (original), 395 (patched)
<https://reviews.apache.org/r/65634/#comment278701>

    how can it return null without thrwing an exception?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 460 (original), 401 (patched)
<https://reviews.apache.org/r/65634/#comment278700>

    this can fail - do you want to continue with other databases?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 746 (original), 595 (patched)
<https://reviews.apache.org/r/65634/#comment278702>

    Wouldn;t dropDatabase also throw exception if it fails?



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 758 (original), 599 (patched)
<https://reviews.apache.org/r/65634/#comment278703>

    This is rather useless since in case of failure tu'll throw MetaException anyway.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 764 (original), 605 (patched)
<https://reviews.apache.org/r/65634/#comment278704>

    See comments fro dropDatabase.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
Line 902 (original), 701 (patched)
<https://reviews.apache.org/r/65634/#comment278705>

    tbl would never be null, there will be an exception if the call above fails.



standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
Lines 687 (patched)
<https://reviews.apache.org/r/65634/#comment278672>

    This should be done outside try-block


- Alexander Kolbasov


On March 1, 2018, 11:09 a.m., Vaibhav Gumashta wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/65634/
> -----------------------------------------------------------
> 
> (Updated March 1, 2018, 11:09 a.m.)
> 
> 
> Review request for hive, Daniel Dai and Thejas Nair.
> 
> 
> Bugs: HIVE-18264
>     https://issues.apache.org/jira/browse/HIVE-18264
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> https://issues.apache.org/jira/browse/HIVE-18264
> 
> 
> Diffs
> -----
> 
>   itests/hcatalog-unit/src/test/java/org/apache/hive/hcatalog/listener/DummyRawStoreFailEvent.java
a3725c5395 
>   service/src/java/org/apache/hive/service/server/HiveServer2.java 86c9c2b33c 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/HiveMetaStore.java
ac71d0882f 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/ObjectStore.java
7b44df4128 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/RawStore.java f500d63725

>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CacheUtils.java
f0f650ddcf 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/CachedStore.java
0d132f2074 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/cache/SharedCache.java
32ea17495f 
>   standalone-metastore/src/main/java/org/apache/hadoop/hive/metastore/utils/MetaStoreUtils.java
50f873a013 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreControlledCommit.java
75ea8c4a77 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/DummyRawStoreForJdoConnection.java
207d842f94 
>   standalone-metastore/src/test/java/org/apache/hadoop/hive/metastore/cache/TestCachedStore.java
ab6feb6f0b 
>   standalone-metastore/src/test/resources/log4j2.properties 365687e1c9 
> 
> 
> Diff: https://reviews.apache.org/r/65634/diff/4/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Vaibhav Gumashta
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message