hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive" by DougCutting
Date Thu, 04 Dec 2008 21:14:22 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by DougCutting:
http://wiki.apache.org/hadoop/Hive

The comment on the change is:
update for subprojecthood

------------------------------------------------------------------------------
- #pragma section-numbers on
+ = Welcome to the Hive Wiki! =
+ 
+ For more information, please see the official [http://hadoop.apache.org/hive/ Hive website].
  
  = Information =
  Following are some useful links for users and developers interested in Hive:
@@ -13, +15 @@

   * [wiki:/PoweredBy A List of Sites and Applications Powered by Hive]
  
  = What is Hive =
- Hive is a data warehouse infrastructure built on top of Hadoop that provides tools to enable
easy data summarization, adhoc querying and analysis of large datasets data stored in Hadoop
files. It provides a mechanism to put structure on this data and it also provides a simple
query language called QL which is based on SQL and which enables users familiar with SQL to
query this data. At the same time, this language also allows traditional map/reduce programmers
to be able to plug in their custom mappers and reducers to do more sophisticated analysis
which may not be supported by the built in capabilities of the language.
+ [http://hadoop.apache.org/hive/ Hive] is a data warehouse infrastructure built on top of
Hadoop that provides tools to enable easy data summarization, adhoc querying and analysis
of large datasets data stored in Hadoop files. It provides a mechanism to put structure on
this data and it also provides a simple query language called QL which is based on SQL and
which enables users familiar with SQL to query this data. At the same time, this language
also allows traditional map/reduce programmers to be able to plug in their custom mappers
and reducers to do more sophisticated analysis which may not be supported by the built in
capabilities of the language.
  
  = What Hive is NOT =
  Hive is based on Hadoop which is a batch processing system. Accordingly, this system does
not and cannot promise low latencies on queries. The paradigm here is strictly of submitting
jobs and being notified when the jobs are completed as opposed to real time queries. As a
result it should not be compared with systems like Oracle where analysis is done on a significantly
smaller amount of data but the analysis proceeds much more iteratively with the response times
between iterations being less than a few minutes. For Hive queries response times for even
the smallest jobs can be of the order of 5-10 minutes and for larger jobs this may even run
into hours.
@@ -22, +24 @@

  
  Hive does not mandate read or written data be in "hive format" - there is no such thing;
Hive works equally well on Thrift, RecordIO, control delimited, or your data format.
  
- 
- = Status =
- Hive has been submitted as a contrib project in hadoop trunk. The details of its availability
are available at [https://issues.apache.org/jira/browse/HADOOP-3601 Hadoop JIRA]
- 
  = Get Involved =
  
- [http://publists.facebook.com/mailman/listinfo/hive-users hive-users mailing list] (at publists.facebook.com)
+ [http://hadoop.apache.org/hive/mailing_lists.html#Users hive-users mailing list]
  

Mime
View raw message