hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive" by EdwardCapriolo
Date Thu, 30 Oct 2008 22:39:42 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by EdwardCapriolo:

The comment on the change is:
Saying 5-10 minutes might scare people. 

  = What Hive is NOT =
  Hive is based on Hadoop which is a batch processing system. Accordingly, this system does
not and cannot promise low latencies on queries. The paradigm here is strictly of submitting
jobs and being notified when the jobs are completed as opposed to real time queries. As a
result it should not be compared with systems like Oracle where analysis is done on a significantly
smaller amount of data but the analysis proceeds much more iteratively with the response times
between iterations being less than a few minutes. For Hive queries response times for even
the smallest jobs can be of the order of 5-10 minutes and for larger jobs this may even run
into hours.
+ If you input data is small you can execute a query in a short time. For example if a table
has 100 rows. You can 'set mapred.reduce.tasks=1' and 'set mapred.map.tasks=1' and the time
will be ~15 seconds. 
  Hive does not mandate read or written data be in "hive format" - there is no such thing;
Hive works equally well on Thrift, RecordIO, control delimited, or your data format.

View raw message