hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From " (Commented) (JIRA)" <>
Subject [jira] [Commented] (HIVE-2472) Metastore statistics are not being updated for CTAS queries.
Date Thu, 03 Nov 2011 19:51:35 GMT

] commented on HIVE-2472:

This is an automatically generated e-mail. To reply, visit:

(Updated 2011-11-03 19:50:15.818785)

Review request for Ning Zhang and Kevin Wilfong.

Summary (updated)

Explanation of how stats for CTAS were added (line numbers may be slightly off due to repository

Because CTAS contains an INSERT, the approach was to reuse as much, from what is already there
for INSERT, as possible.

There were 2 main issues: to make sure that FileSinkOperators will gather stats, and that
there will be StatsTask that will then aggregate them and store to Metastore.

FileSinkOperator gathers stats if conf.isGatherStats (line 576) is true. It is set to true
upon adding StatsTask in GenMRFileSink1 (126) which will happen if isInsertTable will be true,
which is set in 105 (I didn't change comment since it is still being set due to INSERT OVERWRITE
that is just a part of the CTAS). To make it true, one must set that CTAS contains insert
into the table, add the TableSpec, which was done in SemanticAnalyzer (1051) (BaseSemanticAnalyzer
tableSpec() must had been changed to support TOK_CREATETABLE). 

Next issue, was to supply to StatsWork (part of StatsTask) information about the table being
created. To do that, database name was added to CreateTableDesc, and it is set in SemanticAnalyzer
(7878). Then this CreateTableDesc is added to LoadFileDesc (just to get table info) in SemanticAnalyzer(4000),
which then is added to StatsWork in GenMRFileFileSink1 (170). This StatskWork is later used
by StatsTask to get the table info.

Another thing was that StatsTask would be called before the CreateTableTask. To remedy that,
a change in SemanticAnalyzer(7048) was made, so for CTAS the StatsTask will be moved to be
after the crtTblTask.

Finally in StatsTask, support for the LoadFileDesc was added (which is present for CTAS).
Importantly, line 306 was changed, since for CTAS there was an empty partitionList, instead
of null (this last change took me around 3 hours to find, since this was last place I looked
at, when figuring what's wrong).

I noticed that to database.q.out "Cannot get table db1.db1.conflict_name" in line 1224 was
added, but it wasn't present there in previous diff version that contained exactly same Java
code, so I assume it is due to some other work happening concurrently.

This addresses bug HIVE-2472.


  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/ 1196269 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/ 1196269 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ 1196269 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ 1196269 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ 1196269 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ 1196269 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ 1196269 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/ 1196269 
  trunk/ql/src/test/results/clientpositive/ctas.q.out 1196269 
  trunk/ql/src/test/results/clientpositive/database.q.out 1196269 
  trunk/ql/src/test/results/clientpositive/merge3.q.out 1196269 
  trunk/ql/src/test/results/clientpositive/rcfile_createas1.q.out 1196269 
  trunk/ql/src/test/results/clientpositive/smb_mapjoin9.q.out 1196269 



run ant tests with overwrite option, changes to out files are part of the diff



> Metastore statistics are not being updated for CTAS queries.
> ------------------------------------------------------------
>                 Key: HIVE-2472
>                 URL:
>             Project: Hive
>          Issue Type: Bug
>            Reporter: Kevin Wilfong
>            Assignee: Robert Surówka
>         Attachments: HIVE-2472.1.patch.txt, HIVE-2472.2.patch
> We need to add a Statistics task at the end of a CTAS query in order to update the metastore
statistics for the table being created.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message