hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive" by JohnSichi
Date Sun, 26 Jun 2011 23:07:34 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive" page has been changed by JohnSichi:

+ The Apache Hive wiki has moved to [[https://cwiki.apache.org/confluence/display/Hive|Confluence]]!
- = What is Hive =
- [[http://hadoop.apache.org/hive/|Hive]] is a data warehouse infrastructure built on top
of [[.|Hadoop]]. It provides tools to enable easy data ETL, a mechanism to put structures
on the data, and the capability to querying and analysis of large data sets stored in Hadoop
files. Hive defines a simple SQL-like query language, called QL, that enables users familiar
with SQL to query the data. At the same time, this language also allows programmers who are
familiar with the MapReduce framework to be able to plug in their custom mappers and reducers
to perform more sophisticated analysis that may not be supported by the built-in capabilities
of the language.
- Hive does not mandate read or written data be in the "Hive format"---there is no such thing.
Hive works equally well on Thrift, control delimited, or your specialized data formats.  Please
see [[/DeveloperGuide#File_Formats|File Format]] and [[http://www.slideshare.net/ragho/hive-user-meeting-august-2009-facebook|SerDe]]
in the [[/DeveloperGuide|Developer Guide]] for details.
+ If you're looking for a particular page name, try [[https://cwiki.apache.org/confluence/pages/listpages-dirview.action?key=Hive|this
- = What Hive is NOT =
- Hadoop is a batch processing system and Hadoop jobs tend to have high latency and incur
substantial overheads in job submission and scheduling. As a result - latency for Hive queries
is generally very high (minutes) even when data sets involved are very small (say a few hundred
megabytes). As a result it cannot be compared with systems such as Oracle where analyses are
conducted on a significantly smaller amount of data but the analyses proceed much more iteratively
with the response times between iterations being less than a few minutes. Hive aims to provide
acceptable (but not optimal) latency for interactive data browsing, queries over small data
sets or test queries. Hive also does not provide sort of data or query cache to make repeated
queries over the same data set faster.
- Hive is not designed for online transaction processing and does not offer real-time queries
and row level updates. It is best used for batch jobs over large sets of immutable data (like
web logs). What Hive values most are scalability (scale out with more machines added dynamically
to the Hadoop cluster), extensibility (with MapReduce framework and UDF/UDAF/UDTF), fault-tolerance,
and loose-coupling with its input formats.
- = Information =
-  * General information about Hive
-   * [[/GettingStarted|Getting Started]]
-   * [[/Presentations|Presentations and Papers about Hive]]
-   * [[/PoweredBy|A List of Sites and Applications Powered by Hive]]
-   * [[/FAQ|FAQ]]
-   * [[http://hadoop.apache.org/hive/mailing_lists.html#Users|hive-users mailing list]]
-   * Hive IRC Channel: #hive at irc.freenode.net
-  * For users:
-   * [[/Tutorial|Hive Tutorial]]
-   * [[/LanguageManual|HiveQL Language Manual (Queries, DML, DDL, and CLI)]]
-   * [[/HivePlugins|Hive Plug-in Interfaces - User-Defined Functions and SerDes]]
-   * [[/LanguageManual/UDF|Guide to Hive Operators and Functions]]
-    * [[Hive/StatisticsAndDataMining|Functions for Statistics and Data Mining]]
-   * [[/HiveWebInterface|Hive Web Interface]]
-   * [[/HiveClient|Hive Client (JDBC, ODBC, Thrift, etc)]]
-  * For administrators:
-   * [[/AdminManual/Installation|Installing Hive]]
-   * [[/AdminManual/Configuration|Configuring Hive]]
-   * [[/AdminManual/MetastoreAdmin|Setting up Metastore]]
-   * [[/HiveWebInterface|Setting up Hive Web Interface]]
-   * [[/AdminManual/SettingUpHiveServer|Setting up Hive Server (JDBC, ODBC, Thrift, etc)]]
-   * [[/HiveAws|Hive on Amazon Web Services]]
-  * For developers:
-   * [[/HowToContribute|How to Contribute]]
-   * [[/Development/ContributorsMeetings|Hive Contributors Meetings]]
-   * [[/DeveloperGuide|Hive Developer Guide]]
-   * [[/Performance|Hive Performance]]
-   * [[/Design|Hive Architecture Overview]]
-   * [[/DesignDocs|Hive Design Docs]]
-   * [[/Roadmap|Roadmap/call to Add More Features]]
-   * [[http://search-hadoop.com/Hive|Full-text search over all Hive resources]]
-   * [[/HowToCommit|How to Commit]]
-   * [[/HowToRelease|How to Release]]
-   * [[/HudsonBuild|Build status on Jenkins (formerly Hudson)]]
-   * [[https://cwiki.apache.org/confluence/display/Hive/Bylaws|Project Bylaws]]
- For more information, please see the official [[http://hadoop.apache.org/hive/|Hive website]].

View raw message