hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "tim robertson" <timrobertson...@gmail.com>
Subject Re: FW: [jira] Updated: (HADOOP-3601) Hive as a contrib project
Date Wed, 09 Jul 2008 08:25:06 GMT
Hi Ashish

I am very excited to try this, having been evaluating Hadoop, HBase,
Cascading etc recently to process 100 millions of Biodiversity records
(expecting billions soon), with a view for data mining purposes (species
that are critically endangered and observed outside of protected areas
within the last 2 years).  All open access to Biodiversity information.  It
is difficult to comment on the paper, as it looks to offer pretty much most
of what I am looking for, but without running it, it's difficult...

If you would like a tester, I would happily fill this role and offer sample
code and input files which could go into "getting started" guides on wiki



On Wed, Jul 9, 2008 at 9:47 AM, Ashish Thusoo <athusoo@facebook.com> wrote:

> Hi Folks,
> We recently opened up a JIRA in order to bring Hive into the open source
> fold with the aim of contributing back to hadoop - which has really made
> large scale data processing so much easier for us at Facebook. We have
> also uploaded a small tutorial as part of that JIRA that gives a flavor
> of what kind of capabilities the system has. We would love to get
> feedback on this, so please check out the described functionality and
> post any comments, criticisms, wish lists etc. on the JIRA at
> https://issues.apache.org/jira/browse/HADOOP-3601
> We are planning on an initial release of hive as a contrib project in
> 0.19 version of hadoop and are really excited about the open source
> possibilities that it can enable, specially in the data warehousing/ETL
> space. So please stay tunned to the JIRA for future updates on Hive.
> Thanks,
> Ashish for Hive@Facebook
> -----Original Message-----
> From: Ashish Thusoo (JIRA) [mailto:jira@apache.org]
> Sent: Tuesday, July 08, 2008 4:15 PM
> To: Ashish Thusoo
> Subject: [jira] Updated: (HADOOP-3601) Hive as a contrib project
>     [
> https://issues.apache.org/jira/browse/HADOOP-3601?page=com.atlassian.jir
> a.plugin.system.issuetabpanels:all-tabpanel ]
> Ashish Thusoo updated HADOOP-3601:
> ----------------------------------
>    Attachment: HiveTutorial.pdf
> Tutorial on the capabilities of Hive. This is a pdf of internal
> documentation and contains query, dml and ddl examples as well as the
> overview of the system. A formal language spec, architecture documents
> and roadmaps will follow. This document gives the initial preview of the
> system and hopefully will seed a lot of interesting discussion/questions
> etc. around this system.
> > Hive as a contrib project
> > -------------------------
> >
> >                 Key: HADOOP-3601
> >                 URL: https://issues.apache.org/jira/browse/HADOOP-3601
> >             Project: Hadoop Core
> >          Issue Type: New Feature
> >    Affects Versions: 0.17.0
> >            Reporter: Joydeep Sen Sarma
> >            Priority: Minor
> >         Attachments: HiveTutorial.pdf
> >
> >   Original Estimate: 1080h
> >  Remaining Estimate: 1080h
> >
> > Hive is a data warehouse built on top of flat files (stored primarily
> in HDFS). It includes:
> > - Data Organization into Tables with logical and hash partitioning
> > - A Metastore to store metadata about Tables/Partitions etc
> > - A SQL like query language over object data stored in Tables
> > - DDL commands to define and load external data into tables Hive's
> > query language is executed using Hadoop map-reduce as the execution
> engine. Queries can use either single stage or multi-stage map-reduce.
> Hive has a native format for tables - but can handle any data set (for
> example json/thrift/xml) using an IO library framework.
> > Hive uses Antlr for query parsing, Apache JEXL for expression
> evaluation and may use Apache Derby as an embedded database for
> MetaStore. Antlr has a BSD license and should be compatible with Apache
> license.
> > We are currently thinking of contributing to the 0.17 branch as a
> contrib project (since that is the version under which it will get
> tested internally) - but looking for advice on the best release path.
> --
> This message is automatically generated by JIRA.
> -
> You can reply to this email to add a comment to the issue online.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message