hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] Commented: (HIVE-1648) Automatically gathering stats when reading a table/partition
Date Mon, 22 Nov 2010 22:02:15 GMT


Namit Jain commented on HIVE-1648:

I haven't taken a look at the code, but here are the comments for the tests

Instead of:

desc extended <table_name> in the tests,
please use
show table extended like `<table_name>`;

This will dump stats in a new line and can be easily compared.
The non-deterministic stats are ignored.

Add a test for limit in the sub-query.

Dont select from existing tables: src/src1 for your stats tests.
Create new tables and then set to true.
This was, you are sure that the remaining tests will not be affected.

Add another test for 3-way join where the join keys are not the same: something like:

select .. from A join B on A.key1 = B.key1 join C on B.key2 = C.key2 where ....

> Automatically gathering stats when reading a table/partition
> ------------------------------------------------------------
>                 Key: HIVE-1648
>                 URL:
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Ning Zhang
>            Assignee: Paul Butler
>         Attachments: HIVE-1648.2.patch, HIVE-1648.3.patch, HIVE-1648.patch
> HIVE-1361 introduces a new command 'ANALYZE TABLE T COMPUTE STATISTICS' to gathering
stats. This requires additional scan of the data. Stats gathering can be piggy-backed on TableScanOperator
whenever a table/partition is scanned (given not LIMIT operator). 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message