hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "jiraposter@reviews.apache.org (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-2472) Metastore statistics are not being updated for CTAS queries.
Date Thu, 10 Nov 2011 06:53:52 GMT

    [ https://issues.apache.org/jira/browse/HIVE-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13147520#comment-13147520
] 

jiraposter@reviews.apache.org commented on HIVE-2472:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2583/#review3124
-----------------------------------------------------------



trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java
<https://reviews.apache.org/r/2583/#comment6939>

    style nitpick: removing tab.



trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java
<https://reviews.apache.org/r/2583/#comment6959>

    here, rather than checkin partitions.size()==0, I think it's better to change getPartitionList()
to return null when it is CTAS (i.e., getLoadFileDesc() != null). 



trunk/ql/src/test/results/clientpositive/database.q.out
<https://reviews.apache.org/r/2583/#comment6958>

    I suspect this is caused by the change in SemanticAnalyzer.java around line 7906. Can
you test if this is caused by that change? If so you'll need to resolve this rather than updating
the .out file. 



trunk/ql/src/test/results/clientpositive/merge3.q.out
<https://reviews.apache.org/r/2583/#comment6960>

    here stage-2 should only depends on stage-6.
    
    also please add 'desc formatted <tbl>' to verify the stats are correct.



trunk/ql/src/test/results/clientpositive/smb_mapjoin9.q.out
<https://reviews.apache.org/r/2583/#comment6961>

    same: stage-2 should only depends on stage-4.


- Ning


On 2011-11-08 04:08:52, Robert Surówka wrote:
bq.  
bq.  -----------------------------------------------------------
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2583/
bq.  -----------------------------------------------------------
bq.  
bq.  (Updated 2011-11-08 04:08:52)
bq.  
bq.  
bq.  Review request for Ning Zhang and Kevin Wilfong.
bq.  
bq.  
bq.  Summary
bq.  -------
bq.  
bq.  Explanation of how stats for CTAS were added (line numbers may be slightly off due to
repository changes):
bq.  
bq.  
bq.  Because CTAS contains an INSERT, the approach was to reuse as much, from what is already
there for INSERT, as possible.
bq.  
bq.  There were 2 main issues: to make sure that FileSinkOperators will gather stats, and
that there will be StatsTask that will then aggregate them and store to Metastore.
bq.  
bq.  FileSinkOperator gathers stats if conf.isGatherStats (line 576) is true. It is set to
true upon adding StatsTask in GenMRFileSink1 (126) which will happen if isInsertTable will
be true, which is set in 105 (I didn't change comment since it is still being set due to INSERT
OVERWRITE that is just a part of the CTAS). To make it true, one must set that CTAS contains
insert into the table, add the TableSpec, which was done in SemanticAnalyzer (1051) (BaseSemanticAnalyzer
tableSpec() must had been changed to support TOK_CREATETABLE). 
bq.  
bq.  Next issue, was to supply to StatsWork (part of StatsTask) information about the table
being created. To do that, database name was added to CreateTableDesc, and it is set in SemanticAnalyzer
(7878). Then this CreateTableDesc is added to LoadFileDesc (just to get table info) in SemanticAnalyzer(4000),
which then is added to StatsWork in GenMRFileFileSink1 (170). This StatskWork is later used
by StatsTask to get the table info.
bq.  
bq.  Another thing was that StatsTask would be called before the CreateTableTask. To remedy
that, a change in SemanticAnalyzer(7048) was made, so for CTAS the StatsTask will be moved
to be after the crtTblTask.
bq.  
bq.  Finally in StatsTask, support for the LoadFileDesc was added (which is present for CTAS).
Importantly, line 306 was changed, since for CTAS there was an empty partitionList, instead
of null (this last change took me around 3 hours to find, since this was last place I looked
at, when figuring what's wrong).
bq.  
bq.  
bq.  I noticed that to database.q.out "Cannot get table db1.db1.conflict_name" in line 1224
was added, but it wasn't present there in previous diff version that contained exactly same
Java code, so I assume it is due to some other work happening concurrently.
bq.  
bq.  
bq.  This addresses bug HIVE-2472.
bq.      https://issues.apache.org/jira/browse/HIVE-2472
bq.  
bq.  
bq.  Diffs
bq.  -----
bq.  
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/StatsTask.java 1199067 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMRFileSink1.java 1199067 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/BaseSemanticAnalyzer.java 1199067

bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1199067 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/CreateTableDesc.java 1199067 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LoadFileDesc.java 1199067 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/PlanUtils.java 1199067 
bq.    trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/StatsWork.java 1199067 
bq.    trunk/ql/src/test/queries/clientpositive/ctas.q 1199067 
bq.    trunk/ql/src/test/results/clientpositive/ctas.q.out 1199067 
bq.    trunk/ql/src/test/results/clientpositive/database.q.out 1199067 
bq.    trunk/ql/src/test/results/clientpositive/merge3.q.out 1199067 
bq.    trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out 1199067 
bq.    trunk/ql/src/test/results/clientpositive/smb_mapjoin9.q.out 1199067 
bq.  
bq.  Diff: https://reviews.apache.org/r/2583/diff
bq.  
bq.  
bq.  Testing
bq.  -------
bq.  
bq.  run ant tests with overwrite option, changes to out files are part of the diff
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Robert
bq.  
bq.


                
> Metastore statistics are not being updated for CTAS queries.
> ------------------------------------------------------------
>
>                 Key: HIVE-2472
>                 URL: https://issues.apache.org/jira/browse/HIVE-2472
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Kevin Wilfong
>            Assignee: Robert Surówka
>         Attachments: HIVE-2472.1.patch.txt, HIVE-2472.2.patch, HIVE-2472.3.patch, HIVE-2472.4.patch
>
>
> We need to add a Statistics task at the end of a CTAS query in order to update the metastore
statistics for the table being created.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

Mime
View raw message