hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3601) Hive as a contrib project
Date Tue, 22 Jul 2008 17:03:32 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12615673#action_12615673

Doug Cutting commented on HADOOP-3601:

> Owen: In particular, I don't think we should run the contrib unit tests for our patches.

Hmm.  We might still run them, but not fail a core patch if a contrib test fails.  Or perhaps
run them as a separate job in Hudson.  We still want contrib to build and pass tests, and
regular Hudson tests are a good way to achieve this.

> Eric I'd suggest a project like Hive take either the path ZooKeeper or Pig took.

As Owen pointed out, the Pig path (incubator) isn't required here, unless Hive wants to be
a TLP (as Pig did at the time).  The Zookeeper path (new Hadoop subproject) is available.
 I don't have a strong preference.  If Hive is incorporated as a contrib module and it generates
too much mailing list traffic on core lists, that's a success disaster that we can remedy
by promoting it to a subproject.  Or if folks feel confident from the start that it will sustain
a subproject and are willing to create the infrastructure for that, that's fine too.  As Owen
mentioned, a subproject takes more time, to create a JIRA instance, mailing lists, web site,
etc, especially if the folks involved are not already familiar with how these things are done
at Apache.  But it's not that hard.

> Hive as a contrib project
> -------------------------
>                 Key: HADOOP-3601
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3601
>             Project: Hadoop Core
>          Issue Type: New Feature
>    Affects Versions: 0.17.0
>            Reporter: Joydeep Sen Sarma
>            Priority: Minor
>         Attachments: HiveTutorial.pdf
>   Original Estimate: 1080h
>  Remaining Estimate: 1080h
> Hive is a data warehouse built on top of flat files (stored primarily in HDFS). It includes:
> - Data Organization into Tables with logical and hash partitioning
> - A Metastore to store metadata about Tables/Partitions etc
> - A SQL like query language over object data stored in Tables
> - DDL commands to define and load external data into tables
> Hive's query language is executed using Hadoop map-reduce as the execution engine. Queries
can use either single stage or multi-stage map-reduce. Hive has a native format for tables
- but can handle any data set (for example json/thrift/xml) using an IO library framework.
> Hive uses Antlr for query parsing, Apache JEXL for expression evaluation and may use
Apache Derby as an embedded database for MetaStore. Antlr has a BSD license and should be
compatible with Apache license.
> We are currently thinking of contributing to the 0.17 branch as a contrib project (since
that is the version under which it will get tested internally) - but looking for advice on
the best release path.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message