hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lars Francke" <...@lars-francke.de>
Subject Re: Review Request 24289: MetadataUpdater: provide a mechanism to edit the statistics of a column in a table (or a partition of a table)
Date Wed, 06 Aug 2014 00:13:47 GMT

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/24289/#review49661
-----------------------------------------------------------


This is as far as I've gotten today. I'll try to continue tomorrow.


ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
<https://reviews.apache.org/r/24289/#comment86941>

    redundant



ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
<https://reviews.apache.org/r/24289/#comment86942>

    should be private static final transient and no need to be wrapped



ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
<https://reviews.apache.org/r/24289/#comment86943>

    redundant constructor



ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java
<https://reviews.apache.org/r/24289/#comment86945>

    Map instead of HashMap



ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java
<https://reviews.apache.org/r/24289/#comment86940>

    long line (hive uses a maximum of 100 chars)



ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java
<https://reviews.apache.org/r/24289/#comment86939>

    unrelated



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
<https://reviews.apache.org/r/24289/#comment86936>

    missing space



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
<https://reviews.apache.org/r/24289/#comment86928>

    Indentation is wrong in this whole method (and the next one), should be two spaces



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
<https://reviews.apache.org/r/24289/#comment86929>

    Exception never thrown (same for next class)



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
<https://reviews.apache.org/r/24289/#comment86931>

    properly handle and log. Like this it'll throw a NPE later on tbl.getCols() if this goes
wrong.
    
    Alternatively maybe just ignore the exception. It's an unchecked one and there's not much
you can do about it anyway



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
<https://reviews.apache.org/r/24289/#comment86934>

    You're not checking if this actually results in anything. Will result in another NPE later
on.



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
<https://reviews.apache.org/r/24289/#comment86933>

    no need to explicitly create an array
    
    Arrays.asList(colName) etc. works as well.



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
<https://reviews.apache.org/r/24289/#comment86938>

    There's a lot of copy & paste in these two methods. They only differ in the partition
stuff which should be fairly easy to stuff into one method. Haven't checked though...either
way the comments from the previous method apply here too



ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java
<https://reviews.apache.org/r/24289/#comment86935>

    can be Map instead of HashMap



ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java
<https://reviews.apache.org/r/24289/#comment86919>

    tab instead of spaces
    
    braces around the return statement
    
    I also don't fully understand this part but from what I understand you circumvent authentication
checking here?



ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java
<https://reviews.apache.org/r/24289/#comment86908>

    Remove or expand this comment. Leaving it in brings no additional benefit and removing
it will at least flag the class as undocumented.



ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java
<https://reviews.apache.org/r/24289/#comment86909>

    should be Map instead of HashMap



ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java
<https://reviews.apache.org/r/24289/#comment86911>

    Constructor is not used anywhere and can be removed



ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java
<https://reviews.apache.org/r/24289/#comment86910>

    should be Map instead of HashMap



ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java
<https://reviews.apache.org/r/24289/#comment86915>

    No need to call setters in the constructor



ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java
<https://reviews.apache.org/r/24289/#comment86914>

    Are any of the setters actually needed? If not you could remove them and make the fields
private. They don't seem to be used anywhere...



ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java
<https://reviews.apache.org/r/24289/#comment86912>

    should be Map instead of HashMap



ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java
<https://reviews.apache.org/r/24289/#comment86913>

    should be Map instead of HashMap


- Lars Francke


On Aug. 5, 2014, 6:40 p.m., pengcheng xiong wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/24289/
> -----------------------------------------------------------
> 
> (Updated Aug. 5, 2014, 6:40 p.m.)
> 
> 
> Review request for hive.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> This patch provides ability to update certain stats without scanning any data or without
"hacking the backend db". It helps (esp for CBO work) to set up unit tests quickly and verify
both cbo and the stats subsystem. It also helps when experimenting with the system if you're
just trying out hive/hadoop on a small cluster. Finally it gives you a quick and clean way
to fix things when something went wrong wrt stats in your environment.
> Usage:
> ALTER TABLE table_name PARTITION partition_spec UPDATE STATISTICS FOR COLUMN col_name
SET col_statistics
> For example,
> ALTER TABLE src_x_int UPDATE STATISTICS FOR COLUMN key SET ('numDVs'='101','highValue'='10001.0');
> ALTER TABLE src_p PARTITION(partitionId=1) UPDATE STATISTICS FOR COLUMN key SET ('numDVs'='100','avgColLen'='1.0001');
> 
> 
> Diffs
> -----
> 
>   metastore/src/java/org/apache/hadoop/hive/metastore/StatObjectConverter.java c3e2820

>   metastore/src/model/org/apache/hadoop/hive/metastore/model/MPartitionColumnStatistics.java
89c31dc 
>   metastore/src/model/org/apache/hadoop/hive/metastore/model/MTableColumnStatistics.java
44bbab5 
>   ql/src/java/org/apache/hadoop/hive/ql/exec/ColumnStatsUpdateTask.java PRE-CREATION

>   ql/src/java/org/apache/hadoop/hive/ql/exec/TaskFactory.java 24dfed1 
>   ql/src/java/org/apache/hadoop/hive/ql/metadata/SessionHiveMetaStoreClient.java 4300145

>   ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 67a3aa7 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/HiveParser.g ab1188a 
>   ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzerFactory.java 268920a 
>   ql/src/java/org/apache/hadoop/hive/ql/plan/ColumnStatsUpdateWork.java PRE-CREATION

> 
> Diff: https://reviews.apache.org/r/24289/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> pengcheng xiong
> 
>


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message