hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joydeep Sen Sarma (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3601) Hive as a contrib project
Date Wed, 23 Jul 2008 21:42:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12616271#action_12616271

Joydeep Sen Sarma commented on HADOOP-3601:

synced up with a few folks working on this internally. in a nutshell - the contributors seem
to like the idea of making this a contrib project to begin with. 

the sub-project requirements (in terms of PMC involvement) are fairly rigorous and would probably
extend the timeline of releasing hive into the hadoop ecosystem. that is our primary concern
at this time. as the project matures - it's possible/likely that a sub-project designation
is more appropriate.

to address the concerns about email  traffic on core-dev - we had a suggestion. if we can
put the 'component' field in the email header (Pete found this useful link: http://www.atlassian.com/software/jira/docs/latest/emailcontent.html)
- then client-side mail filtering should be able to isolate hive jira traffic from that of
hadoop (or other contrib projects). there have already been suggestions on this thread with
not having contrib test failures stop acceptance of patches - and that would probably alleviate
the other major concern around slowing core development down. would these address most of
the concerns that are motivating the sandbox/sub-project discussion?

i dont think we will see a lot of traffic on core-users mailing list (based on the follow
up traffic from Ashish's posting of the hive language tutorial) - but we will just have to
see how that turns out.

> Hive as a contrib project
> -------------------------
>                 Key: HADOOP-3601
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3601
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.17.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Minor
>         Attachments: HiveTutorial.pdf
>   Original Estimate: 1080h
>  Remaining Estimate: 1080h
> Hive is a data warehouse built on top of flat files (stored primarily in HDFS). It includes:
> - Data Organization into Tables with logical and hash partitioning
> - A Metastore to store metadata about Tables/Partitions etc
> - A SQL like query language over object data stored in Tables
> - DDL commands to define and load external data into tables
> Hive's query language is executed using Hadoop map-reduce as the execution engine. Queries
can use either single stage or multi-stage map-reduce. Hive has a native format for tables
- but can handle any data set (for example json/thrift/xml) using an IO library framework.
> Hive uses Antlr for query parsing, Apache JEXL for expression evaluation and may use
Apache Derby as an embedded database for MetaStore. Antlr has a BSD license and should be
compatible with Apache license.
> We are currently thinking of contributing to the 0.17 branch as a contrib project (since
that is the version under which it will get tested internally) - but looking for advice on
the best release path.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message