hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject hive_test : A unit testing environment for hive and hive-service
Date Fri, 16 Sep 2011 16:06:22 GMT
Some history:
The unit testing inside hive is good at doing what it does. Essentially it
runs a hive .q file and diffs the file against previous known runs.

This does some nice things for hive:

1) We are sure the query planner/parsers evaluate the query the same way.
2) We are sure the query returns the same physical results

The not so nice side:
1) Pretty intensive to run tests, creates tables before hand
2) Not really ideal for developing something not a part of hive, inhouse
UDF)
3) No way to integrate with standard unit tests (asserts etc)

Alternative solution:

hive_test
Unit test framework for hive and hive-service
https://github.com/edwardcapriolo/hive_test

Now that all the components of hive live in maven, we can bring the
dependencies of hive into other projects. We can extend the HadoopTestCase
into several other test cases. Currently we have abstract Test Cases for
embedded hive and a hive-service.

The end result is the user can now write end to end unit tests like this:


public class ServiceExampleTest extends HiveTestService {

public ServiceExampleTest() throws IOException {
     super();
}

public void testExecute() throws Exception {
    Path p = new Path(this.ROOT_DIR,"afile");
    FSDataOutputStream o= this.getFileSystem().create(p);
    BufferedWriter bw = new BufferedWriter(new OutputStreamWriter(o));
    bw.write("1\n");
    bw.write("2\n");
    bw.close();

    client.execute("create table atest (num int)");
    client.execute("load data local inpath '"+p.toString() +"' into table
atest" );
    client.execute("select count(1) as cnt from atest");
    String row = client.fetchOne();
    assertEquals(row, "2");
    client.execute("drop table atest");
}

}

You can also use the embedded capability to test workflows, no more
untestable integrations like this.

doIt.sh
in=$1
out=$2
`hadoop jar wordcount /$in /$out`
`hive -e "create table wordcount ..."
`hive -e "load data  inpath $out into wordcount"`

Now these can be done in single java process!

Mime
View raw message