hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <>
Subject [jira] Commented: (HIVE-1938) Cost Based Query optimization for Joins in Hive
Date Tue, 01 Feb 2011 21:32:29 GMT


Namit Jain commented on HIVE-1938:

Currently, Hive does not maintain statistics (distinct values per table/partition), which
is the basis for the
cost model for this discussion.

Do you want to work on collecting such statistis first, and then we can use them for various
plan optimizations ?

I can think of some advantages of the cost model right away (and I am sure there are many
1. Predict "progress" for a query, predict the time taken.
2. Determine the join order.

> Cost Based Query optimization for Joins in Hive
> -----------------------------------------------
>                 Key: HIVE-1938
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>         Environment: *nix,java
>            Reporter: bharath v
>            Assignee: bharath v
> Current optimization in Hive is just rule-based and involves applying a set of rules
on the Plan tree. This depends on hints given by the user (which may or may-not be correct)
and might result in execution of costlier plans.So this jira aims at building a cost-model
which can give a good estimate various plans before hand (using some meta-data already collected)
and we can choose the best plan which incurs the least cost.

This message is automatically generated by JIRA.
For more information on JIRA, see:


View raw message