hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alan Gates (Commented) (JIRA)" <>
Subject [jira] [Commented] (HIVE-2670) A cluster test utility for Hive
Date Wed, 21 Dec 2011 18:45:30 GMT


Alan Gates commented on HIVE-2670:

Attached a first patch.  This is not ready for inclusion yet, I'm just putting it up here
to start getting feedback.  The following will need to be resolved before it is checked in:
# Currently it just has the base harness code included as a tar file.  This really should
be externed from the Pig code base, as HCatalog does.
# I don't know if this is the right place in SVN or not.  I put it all in a test-e2e directory
right under trunk.  I need feedback on whether this is a good spot or somewhere else would
be preferred.
# Connect the top level build.xml to this so it is possible to invoke the tests from the top
level directory.  I was waiting to do this until I had feedback on the proper directory structure.

How to use it:

After applying the patch you will need to copy the harness.tar file (attached) to test-e2e,
since that is not done for you by the patch tool.

First you need an existing Hadoop cluster (it can be very small, just a few nodes) and a MySQL
database.  I ran my tests against Hadoop, but this should run against any 0.20.x
version of Hadoop.  Then:
# Run the script test-e2e/scripts/create_test_db.sql against your MySQL database as a user
that can create users and databases, and grant to users (root is a good choice)
# Run "ant package" in the top level Hive directory
# cd test-e2e
# ant -Dharness.hadoop.home=<path_to_hadoop_home> -Dharness.hive.home=<path_to_hive_you_want_to_test>
# ant -Dharness.hadoop.home=<path_to_hadoop_home> -Dharness.hive.home=<path_to_hive_you_want_to_test>

Usually <path_to_hive_you_want_to_test> will be $CWD/../build/dist

The basic design of this test harness is each test consists of three phases:  run_test, generate_benchmark,
and compare_results.  In run_test a particular test is run.  generate_benchmark runs the same
or a similar test against a known source of truth.  compare_results then compares the results
and declares the test to have succeeded, failed, or aborted.  The harness delegates each of
these three functions to drivers that are specific to different types of tests.

This patch includes two drivers, a Hive driver and a Hive command line driver.  The Hive driver
uses the MySQL database as a source of truth.  Each SQL script is run against Hive and against
MySQL and the results compared using the Unix cksum tool.  

For more information on the test harness, including how to add tests to it, see
 The Hive driver does not yet support running alternate SQL for benchmarking nor using an
old version of Hive for the benchmarks, though those should be added sometime.

> A cluster test utility for Hive
> -------------------------------
>                 Key: HIVE-2670
>                 URL:
>             Project: Hive
>          Issue Type: New Feature
>          Components: Testing Infrastructure
>            Reporter: Alan Gates
>         Attachments: harness.tar, hive_cluster_test.patch
> Hive has an extensive set of unit tests, but it does not have an infrastructure for testing
in a cluster environment.  Pig and HCatalog have been using a test harness for cluster testing
for some time.  We have written Hive drivers and tests to run in this harness.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message