hive-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
Subject svn commit: r1203885 - /hive/trunk/README.txt
Date Fri, 18 Nov 2011 22:22:32 GMT
Author: jvs
Date: Fri Nov 18 22:22:31 2011
New Revision: 1203885

HIVE-2598. Update README.txt file to use description from wiki
(Carl Steinbach via jvs)


Modified: hive/trunk/README.txt
--- hive/trunk/README.txt (original)
+++ hive/trunk/README.txt Fri Nov 18 22:22:31 2011
@@ -1,14 +1,27 @@
-Apache Hive @VERSION@
+Apache Hive (TM) @VERSION@
-Apache Hive is a data warehouse system for Hadoop that facilitates
-easy data summarization, ad-hoc querying and analysis of large
-datasets stored in Hadoop compatible file systems. Hive provides a
-mechanism to put structure on this data and query the data using a
-SQL-like language called HiveQL. At the same time this language also
-allows traditional map/reduce programmers to plug in their custom
-mappers and reducers when it is inconvenient or inefficient to express
-this logic in HiveQL.
+The Apache Hive (TM) data warehouse software facilitates querying and
+managing large datasets residing in distributed storage. Built on top
+of Apache Hadoop (TM), it provides:
+* Tools to enable easy data extract/transform/load (ETL)
+* A mechanism to impose structure on a variety of data formats
+* Access to files stored either directly in Apache HDFS (TM) or in other
+  data storage systems such as Apache HBase (TM)
+* Query execution via MapReduce
+Hive defines a simple SQL-like query language, called QL, that enables
+users familiar with SQL to query the data. At the same time, this
+language also allows programmers who are familiar with the MapReduce
+framework to be able to plug in their custom mappers and reducers to
+perform more sophisticated analysis that may not be supported by the
+built-in capabilities of the language. QL can also be extended with
+custom scalar functions (UDF's), aggregations (UDAF's), and table
+functions (UDTF's).
 Please note that Hadoop is a batch processing system and Hadoop jobs
 tend to have high latency and incur substantial overheads in job

View raw message