hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anja Gruenheid <>
Subject Query Optimization in Hive
Date Tue, 01 Feb 2011 16:19:28 GMT

I'm a grad student at Georgia Tech and I'm currently working with Hive 
for a university project. The project is on query optimization 
techniques and possibilities in Hive. I know that there have been a lot 
of additions to the ql and metastore components since the latest release 
and I was hoping to help advancing those components even further. My 
main interests in the course of my research is the storage and use of 
metadata to run a cost-based optimizer. This involves basic 
optimizations using for example the table size for cost estimations, but 
also more advanced approaches using histograms. I know that table and 
partition information is already collected in Hive, but from what I 
could gather, column metadata and histograms are still open. Would it be 
possible for me to contribute to the project in that area?


View raw message