hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Azza Abouzeid (JIRA)" <j...@apache.org>
Subject [jira] Created: (HIVE-721) Integration with HadoopDB
Date Tue, 04 Aug 2009 15:44:14 GMT
Integration with HadoopDB
-------------------------

                 Key: HIVE-721
                 URL: https://issues.apache.org/jira/browse/HIVE-721
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Query Processor
    Affects Versions: 0.4.0
            Reporter: Azza Abouzeid
            Priority: Minor
             Fix For: 0.4.0


The HadoopDB project integrates Hadoop with single node databases, which provide a high performance
data layer for analytical queries over structured data. HadoopDB's SMS (SQL-to-MapReduce-to-SQL)
component uses Hive's SemanticAnalyzer to convert SQL to MapReduce plans. After plan generation,
we recreate SQL from the lower plan operators and push the SQL into database layer maintaining
the upper layers of the plan, that can't be pushed into the single node databases, intact.
For more information on this process, please read the HadoopDB paper (http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf)
and browse the source code if you feel like it (more specifically the SQLQueryGenerator class)
at http://sourceforge.net/projects/hadoopdb/. 

HadoopDB is a natural system level extension of Hive's goal of providing a simple SQL interface
for large-scale data processing.

A simple patch that integrates Hive with HadoopDB's SMS could be found here: http://hadoopdb.svn.sourceforge.net/viewvc/hadoopdb/trunk/Patches/hive-sms.patch?view=log

In addition to the semantic analyzer post-processing, we modified certain areas to allow paths
to be associated with databases to allow the recreation of the operator tree from the map.input.file
configuration. Instead of FileInputSplit --- we set up an interface Pathable, to allow any
inputsplit that implements pathable to return a dummy path equivalent to the map.input.file
path.

Instead of the post semantic analysis function call to the SQLQueryGenerator class, you could
also use hooks. One such suggestion provided by a HadoopDB user is found here http://sourceforge.net/tracker/index.php?func=detail&aid=2829253&group_id=269559&atid=1146689.

We would really appreciate your help in better integrating Hive and HadoopDB. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message