hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ashish Thusoo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3601) Hive as a contrib project
Date Wed, 23 Jul 2008 22:30:31 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616287#action_12616287
] 

Ashish Thusoo commented on HADOOP-3601:
---------------------------------------

I am not very sure how much JIRA traffic this would generate initially. In the long run, if
this becomes popular, it will of course generate a lot, but at this time, considering that
people would just be curious about it and be experimenting with it, it seems to me that creating
a sub project is an over optimization. At least I think, hive still needs to prove itself
before it can be called a sub project in its own right. There is potential, but a lot will
depend on how the community adopts it both the user community and the developer community.

Putting Hive in contrib, also ensures that we are focussed on working within the Hadoop ecosystem
and also focussed on making sure that Hive development doesn't lag Hadoop development and
that we actively move forward as Hadoop interfaces involved. It ensures that we do not diverge
too much from Hadoop releases.

Given all that, it seems desirable that we carry on with the contrib model and monitor this
closely to see if it earns the right to being a sub project.

> Hive as a contrib project
> -------------------------
>
>                 Key: HADOOP-3601
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3601
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.17.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Minor
>         Attachments: HiveTutorial.pdf
>
>   Original Estimate: 1080h
>  Remaining Estimate: 1080h
>
> Hive is a data warehouse built on top of flat files (stored primarily in HDFS). It includes:
> - Data Organization into Tables with logical and hash partitioning
> - A Metastore to store metadata about Tables/Partitions etc
> - A SQL like query language over object data stored in Tables
> - DDL commands to define and load external data into tables
> Hive's query language is executed using Hadoop map-reduce as the execution engine. Queries
can use either single stage or multi-stage map-reduce. Hive has a native format for tables
- but can handle any data set (for example json/thrift/xml) using an IO library framework.
> Hive uses Antlr for query parsing, Apache JEXL for expression evaluation and may use
Apache Derby as an embedded database for MetaStore. Antlr has a BSD license and should be
compatible with Apache license.
> We are currently thinking of contributing to the 0.17 branch as a contrib project (since
that is the version under which it will get tested internally) - but looking for advice on
the best release path.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message