hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ning Zhang <>
Subject Re: Query Optimization in Hive
Date Tue, 01 Feb 2011 03:52:48 GMT
Hi Anja,

As you noticed Hive only have limited supports for cost-baesd optimization. One of the reasons
is that Hive used to have very small number of optional execution plans to choose from. One
exception is mapjoin vs common joins. Liying Tang had some work on his last intern to convert
common joins to mapjoin in a rule-based fashion. One of his future works is to automatically
convert common join to mapjoins based on stats. There are also ongoing work on indexes on
Hive. With the support of indexes, CBO will be much needed. 

In order for a decent CBO to work, we need stats and cost models. There are some work in stats.
Table/partition level stats has already been supported. There is a JIRA open for column level
stats (HIVE-1362). Cost model is much more complex in Hadoop environment and closely dependent
on the mapjoin/index implementations. Given al these in place, we can then talk about plan
enumeration etc. 

So yes, we are interested in CBO, but it is a large area and many missing pieces need to be
filled in Hive. If you have particular interest in some area, you can propose your ideas in mailing list or even apply for an intern at FB if you would like
to work closely with us. 


On Jan 31, 2011, at 2:04 PM, Anja Gruenheid wrote:

> Hi!
> I'm a graduate student from Georgia Tech and I'm working with Hive for a research project.
I am interested in query optimization and the Hive MetaStore in that context. Working through
the documentation and code, I noticed that the implementation right now is using a rule-based
optimization system. Therefore, I was wondering whether cost-based query optimization will
be a future task in the development of Hive and if it would be possible for me to cooperate
with the developers of Hive to advance the project in general.
> Best regards,
> Anja Gruenheid

View raw message