hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Namit Jain (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HIVE-721) Integration with HadoopDB
Date Fri, 18 Dec 2009 00:02:19 GMT

     [ https://issues.apache.org/jira/browse/HIVE-721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Namit Jain updated HIVE-721:

    Fix Version/s:     (was: 0.5.0)

> Integration with HadoopDB
> -------------------------
>                 Key: HIVE-721
>                 URL: https://issues.apache.org/jira/browse/HIVE-721
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.4.0
>            Reporter: Azza Abouzeid
>            Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
> The HadoopDB project integrates Hadoop with single node databases, which provide a high
performance data layer for analytical queries over structured data. HadoopDB's SMS (SQL-to-MapReduce-to-SQL)
component uses Hive's SemanticAnalyzer to convert SQL to MapReduce plans. After plan generation,
we recreate SQL from the lower plan operators and push the SQL into database layer maintaining
the upper layers of the plan, that can't be pushed into the single node databases, intact.
For more information on this process, please read the HadoopDB paper (http://db.cs.yale.edu/hadoopdb/hadoopdb.pdf)
and browse the source code if you feel like it (more specifically the SQLQueryGenerator class)
at http://sourceforge.net/projects/hadoopdb/. 
> HadoopDB is a natural system level extension of Hive's goal of providing a simple SQL
interface for large-scale data processing.
> A simple patch that integrates Hive with HadoopDB's SMS could be found here: http://hadoopdb.svn.sourceforge.net/viewvc/hadoopdb/trunk/Patches/hive-sms.patch?view=log
> In addition to the semantic analyzer post-processing, we modified certain areas to allow
paths to be associated with databases to allow the recreation of the operator tree from the
map.input.file configuration. Instead of FileInputSplit --- we set up an interface Pathable,
to allow any inputsplit that implements pathable to return a dummy path equivalent to the
map.input.file path.
> Instead of the post semantic analysis function call to the SQLQueryGenerator class, you
could also use hooks. One such suggestion provided by a HadoopDB user is found here http://sourceforge.net/tracker/index.php?func=detail&aid=2829253&group_id=269559&atid=1146689.
> We would really appreciate your help in better integrating Hive and HadoopDB. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message