impala-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Behm <>
Subject Re: Fw: Impala tests and estimate
Date Fri, 13 May 2016 06:35:23 GMT
On Thu, May 12, 2016 at 11:09 PM, Nishidha Panpaliya <>

> Thanks Jim for this information.
> I'd a few more queries -
>    What is the system configuration you are using on which the estimates
>    you gave hold true? RAM, HDD, CPU or any other requirement?

Times were reported on a m2.4xlarge EC2 instance. See specs here:

   We also wanted to know pre-requisites to run each of these tests so that
>    we start preparing for it upfront. For e.g. backend tests does not need
>    any test data, however frontend tests do need test data to be generated
>    and loaded.

All tests except the backend tests require the test data to be loaded.

>    Are there any detailed documents listing steps to prepare and execute
>    all these tests.

Probably not detailed enough for you. Tracing through and should give you a good idea.

>    Test data generation is being done by default using with
>    -testdata argument. Can we customize this step to generate different
>    data or some scaled (small scale) data? Do we even need to do so to
>    ensure Impala works with different data sets?

The tests require the test data to be set up exactly the way it is today. I
highly recommend running the functional tests for validation.

You can certainly customize, but it's well... custom. So we cannot really
help you much there. You'll need to change the scripts/flow to your liking.

>    Also, does time for each of these tests as you mentioned take test data
>    generation and loading time into consideration or is it purely test
>    execution duration?

Purely test execution.

>    We also observed test data loading takes more than 5 hrs at our end both
>    on x86 and power? How much time does it take for you? Also, when should
>    we really need to generate test data from scratch (-format argument to
> I hope it is not needed every time.

The test data does not need to be loaded from scratch every time. We have
the following workflow in place that you could replicate:

1. Generate test data snapshots
-  run with -testdata to generate the test data
- zip the HDFS test warehouse directory into a "data snapshot"
- dump the Hive metastore database into a "metastore snapshot"
- these two snapshots allow for a fast snapshot-based data load in
subsequent test runs

2. Use test data snapshots in a test run:
- do a with the -snapshot_file and -metastore_snapshot_file
arguments that point to the snapshots mentioned above
- data loading from these snapshots takes roughly 20-30 minutes

Of course, when you make changes to the test data, then you probably need
to regenerate these snapshots.

I will privately send you a script that can hopefully get you started with
this workflow, assuming you want to follow it.

>    Should we consider testing of release build and debug build separately?
>    Do you expect any differences in behavior? Also, what all dependencies
>    will need to be rebuilt in release mode?

Testing release and debug is certainly recommended.

I recommend you take a look at the CMakeLists.txt in the Impala root
directory to see what happens in a release build.
You can also look at bin/ to learn more.

> We are also open for a call if any developer/tester is interested in
> discussing these points. Actually, we need this test plan a bit urgent as
> couple of our customers are waiting for timeline.

I'm open to getting on a call next week.

Best regards,


> Thanks,
> Nishidha
> ----- Forwarded by Nishidha Panpaliya/Austin/Contr/IBM on 05/13/2016 11:11
> AM -----
> From:   Sudarshan Jagadale/Austin/Contr/IBM
> To:     Nishidha Panpaliya/Austin/Contr/IBM@IBMUS
> Date:   05/13/2016 10:54 AM
> Subject:        Fw: Impala tests and estimate
> Thanks and Regards
> Sudarshan Jagadale
> Power Open Source Solutions
> ----- Forwarded by Sudarshan Jagadale/Austin/Contr/IBM on 05/13/2016 10:53
> AM -----
> From:   Jim Apple <>
> To:
> Cc:     Manish Patil/Austin/Contr/IBM@IBMUS, Sudarshan
>             Jagadale/Austin/Contr/IBM@IBMUS, Anup
>             Halarnkar/Austin/Contr/IBM@IBMUS, Valencia
>             Serrao/Austin/Contr/IBM@IBMUS
> Date:   05/12/2016 11:56 PM
> Subject:        Re: Fw: Impala tests and estimate
> The backend tests take 12 minutes. The frontend tests take 10 seconds. The
> JDBC tests take 2 minutes. The custom cluster tests take 35 minutes. The
> end-to-end tests take 3 hours.
> That's in "core" mode. "exhaustive" mode quadruples the total time, IIRC,
> and I'd guess that's all in the end-to-end tests, but I'm not sure.
> On Thu, May 12, 2016 at 5:40 AM, Nishidha Panpaliya <>
> wrote:
>   Hi All,
>   Could you please let me know the scope of Impala unit testing? I mean
>   what all tests should be executed and ensured. I saw BE, FE, EE, JDBC,
>   Cluster tests in
>   And a guess estimate of how much time each of these take to execute?
>   Thanks,
>   Nishidha
>   ----- Forwarded by Nishidha Panpaliya/Austin/Contr/IBM on 05/12/2016
>   06:07 PM -----
>   From: Nishidha Panpaliya/Austin/Contr/IBM
>   To:
>   Cc: "Jim Apple" <>, Manish
>   Patil/Austin/Contr/IBM@IBMUS, Sudarshan Jagadale/Austin/Contr/IBM@IBMUS,
>   "Tim Armstrong" <>, Valencia
>   Serrao/Austin/Contr/IBM@IBMUS
>   Date: 03/29/2016 06:59 PM
>   Subject: Re: Impala tests and estimate
>   Just one more request.
>   We'll be thankful if we could also get to know the count of each of these
>   tests (for e.g. there are 71 backend tests).
>   Thanks,
>   Nishidha
>   Nishidha Panpaliya---03/29/2016 10:05:29 AM---Hi All, I again need your
>   help in understanding Impala tests to be run and ensured and their
>   estimat
>   From: Nishidha Panpaliya/Austin/Contr/IBM
>   To:, "Tim Armstrong" <
>>, "Jim Apple" <>
>   Cc: Sudarshan Jagadale/Austin/Contr/IBM@IBMUS, Manish
>   Patil/Austin/Contr/IBM@IBMUS, Valencia Serrao/Austin/Contr/IBM@IBMUS
>   Date: 03/29/2016 10:05 AM
>   Subject: Impala tests and estimate
>   Hi All,
>   I again need your help in understanding Impala tests to be run and
>   ensured and their estimates.
>   Last time, I know you had given way to run only backend tests and it was
>   helpful to us. I've also gone through which triggers
>   backend test, frontend test, end-to-end tests, etc. Could you provide me
>   individual commands to run each of them and if any setup steps are
>   required? Also, I would like to know if there are any specific system
>   requirements that I must have up-front to run all these tests.
>   Along with these commands/scripts, I'm also interested in knowing how
>   much time each of these tests take to run, if we do not run into any
>   issues. This is required to know the guess estimate of how long will this
>   activity be taking from now.
>   Thanks in advance,
>   Nishidha

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message