hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Szehon Ho <sze...@cloudera.com>
Subject Re: HiveQA is getting annoyingly slow
Date Thu, 24 Sep 2015 00:10:46 GMT
Thanks for the investigation, its very helpful.

I think its a good idea to disable the most highly offending tests as you
suggested if they are testing features that aren't heavily used
(rcfile_merge1 and gbtoidx).  I wasnt aware we are dropping support on MR
anytime soon though?

I think removing those two offending tests (or any other we identify) would
be low-hanging fruit, on our end we'll also see whether we can play around
with the build's configured batch sizes of MiniMR tests (unit of
parallelism in the PTest) and see if we can get it down further.  Hope that
works?

Thanks
Szehon

On Wed, Sep 23, 2015 at 4:28 PM, Sergey Shelukhin <sergey@hortonworks.com>
wrote:

> HiveQA is taking too long to run.
> I browsed test results a bit, the main offenders are obviously various
> CliDrivers.
> I think there’s a JIRA to speed up Tez CLI driver that is being worked on;
> and Spark and HBase have tolerable runtimes.
>
> That leaves us base CliDriver and MR.
> Base tests generally take 0-30seconds, 1-2 minutes at most, but there are
> some ridiculous test runtimes (these are fairly consistent between runs):
> testCliDriver_rcfile_merge1                  31 min
> testCliDriver_escape2                  13 min
> testCliDriver_escape1                  8 min 10 sec
> testCliDriver_dynpart_sort_opt_vectorization                  4 min 47 sec
> testCliDriver_unionDistinct_1                  4 min 32 sec
> testCliDriver_dynpart_sort_optimization                  4 min 2 sec
> testCliDriver_rcfile_merge2                  3 min 55 sec
> testCliDriver_vector_leftsemi_mapjoin                  3 min 53 sec
> testCliDriver_archive_excludeHadoop20                  3 min 13 sec
>
>
> If we remove or rein in 3 tests the testCliDriver runtime will go down by
> almost an hour.
> Anyone particularly attached to rcfile tests? It’s all good to test
> rcfile, but it’s a rarely use format with Avro, ORC and Parquet seemingly
> having taken over (not speaking of Text), the test should not take half an
> hour. I suggest we disable this test (rcfile_merge1) and file a JIRA to
> investigate its perf if someone feels it’s important.
> Another work item is to look at why escape tests take so long, it should
> be a simple thing to test, not 21 minutes aggregate (most test finish in
> 0-2 minutes).
>
> Then, MiniMR test takes 2 hours. Some GBY index specific tests are the
> worst offenders (gbtoidx), to the tune of 35mins for 3 tests; as well as
> smb_mapjoin for 15mins.
> Since the plan was to drop MR support on master, how about starting by not
> running these long MR tests and deprecating MR engine, while still keeping
> it around before the task of removing it.
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message