impala-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From tarmstr...@apache.org
Subject [1/4] incubator-impala git commit: IMPALA-3501: ee tests: detect build type and support different timeouts based on the same
Date Thu, 26 May 2016 02:41:51 GMT
Repository: incubator-impala
Updated Branches:
  refs/heads/master f09c6311c -> 22669e23b


IMPALA-3501: ee tests: detect build type and support different timeouts based on the same

Impala compiled with the address sanitizer, or compiled with code
coverage, runs through code paths much slower. This can cause end-to-end
tests that pass on a non-ASAN or non-code coverage build to fail. Some
examples include IMPALA-2721, IMPALA-2973, and IMPALA-3501. These
classes of failures tend always to involve some time-sensitive condition
that fails to succeed under such "slow builds".

The works-around in the past have been to simply increase the timeout.
The problem with this approach is that it relaxes conditions for tests
on builds that see the field--i.e., release builds--for builds that
never will--i.e., ASAN and code coverage.

This patch fixes that problem by allowing test authors to set timeout
values based on a *specific* build type. The author may choose timeouts
with a default value, and different timeouts for either or both
so-called "slow builds": ASAN and code coverage.

We detect the so-called "specific build type" by inspecting the binary
expected to be at the path under test. This removes the need to make
alterations to Impala itself. The inspection done is to read the DWARF
information in the binary, specifically the first compile unit's
DW_AT_producer and DW_AT_name DIE attributes. We employ a heuristic
based on these attributes' values to guess the build type. If we can't
determine the build type, we will assume it's a debug build. More
information on this is in IMPALA-3501.

A quick summary of the changes follows:

1. Move some of the logic in tests.common.skip to tests.common.environ
   and rework some skip marks to be more precise.

2. Add Pyelftools for convenient deserialization of DWARF

3. Our Pyelftools usage requires collections.OrderedDict, which isn't in
   python2.6; also add Monkeypatch to handle this.

4. Add ImpalaBuild and specific_build_type_timeout, the core of the new
   functionality

5. Fix the statestore tests that only fail under code coverage (the
   basis for IMPALA-3501)

Testing:

The tests that were previously, reliably failing under code coverage now
pass. I also ran perfunctory tests of debug, release, and ASAN builds to
ensure our detection of build type is working. This patch will *not*
turn the code coverage builds green; there are other tests that fail,
and fixing all of them here is out of the scope of this patch.

Change-Id: I2b675c04c54e36d404fd9e5a6cf085fb8d6d0e47
Reviewed-on: http://gerrit.cloudera.org:8080/3156
Reviewed-by: Michael Brown <mikeb@cloudera.com>
Tested-by: Internal Jenkins


Project: http://git-wip-us.apache.org/repos/asf/incubator-impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-impala/commit/22669e23
Tree: http://git-wip-us.apache.org/repos/asf/incubator-impala/tree/22669e23
Diff: http://git-wip-us.apache.org/repos/asf/incubator-impala/diff/22669e23

Branch: refs/heads/master
Commit: 22669e23bef947ea155156e1af85aadc2c64e64a
Parents: 6198d92
Author: Michael Brown <mikeb@cloudera.com>
Authored: Mon May 9 14:15:39 2016 -0700
Committer: Tim Armstrong <tarmstrong@cloudera.com>
Committed: Wed May 25 19:41:45 2016 -0700

----------------------------------------------------------------------
 infra/python/deps/requirements.txt           |   2 +
 tests/common/environ.py                      | 246 ++++++++++++++++++++++
 tests/common/skip.py                         |  37 +---
 tests/custom_cluster/test_alloc_fail.py      |   4 +-
 tests/custom_cluster/test_exchange_delays.py |   4 +-
 tests/statestore/test_statestore.py          |   8 +-
 6 files changed, 267 insertions(+), 34 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/22669e23/infra/python/deps/requirements.txt
----------------------------------------------------------------------
diff --git a/infra/python/deps/requirements.txt b/infra/python/deps/requirements.txt
index 5409027..9ffafc6 100644
--- a/infra/python/deps/requirements.txt
+++ b/infra/python/deps/requirements.txt
@@ -36,11 +36,13 @@ impyla == 0.11.2
   # before thirdparty is built thrift will be installed anyways.
   thrift == 0.9.0
   thrift_sasl == 0.1.0
+monkeypatch == 0.1rc3
 ordereddict == 1.1
 pexpect == 3.3
 pg8000 == 1.10.2
 prettytable == 0.7.2
 psutil == 0.7.1
+pyelftools == 0.23
 pyparsing == 2.0.3
 pytest == 2.7.2
   py == 1.4.30

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/22669e23/tests/common/environ.py
----------------------------------------------------------------------
diff --git a/tests/common/environ.py b/tests/common/environ.py
new file mode 100644
index 0000000..db487e8
--- /dev/null
+++ b/tests/common/environ.py
@@ -0,0 +1,246 @@
+# Copyright (c) 2016 Cloudera, Inc. All rights reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import logging
+import os
+import re
+
+try:
+  from elftools.elf.elffile import ELFFile
+except ImportError as e:
+  # Handle pre-python2.7s' lack of collections.OrderedDict, which we include in
+  # impala-python as ordereddict.OrderedDict.
+  if 'cannot import name OrderedDict' == str(e):
+    import monkeypatch
+    from ordereddict import OrderedDict
+    monkeypatch.patch(OrderedDict, 'collections', 'OrderedDict')
+    from elftools.elf.elffile import ELFFile
+  else:
+    raise e
+
+
+LOG = logging.getLogger('tests.common.environ')
+
+
+# See if Impala is running with legacy aggregations and/or hash joins. This is kind of a
+# hack. It would be better to poll Impala whether it is doing so.
+test_start_cluster_args = os.environ.get("TEST_START_CLUSTER_ARGS", "")
+old_agg_regex = "enable_partitioned_aggregation=false"
+old_hash_join_regex = "enable_partitioned_hash_join=false"
+USING_OLD_AGGS_JOINS = re.search(old_agg_regex, test_start_cluster_args) is not None or \
+    re.search(old_hash_join_regex, test_start_cluster_args) is not None
+
+# Find the likely BuildType of the running Impala. Assume it's found through the path
+# $IMPALA_HOME/be/build/latest as a fallback.
+impala_home = os.environ.get("IMPALA_HOME", "")
+build_type_arg_regex = re.compile(r'--build_type=(\w+)', re.I)
+build_type_arg_search_result = re.search(build_type_arg_regex, test_start_cluster_args)
+
+if build_type_arg_search_result is not None:
+  build_type_dir = build_type_search_result.groups()[0].lower()
+else:
+  build_type_dir = 'latest'
+
+# Resolve any symlinks in the path.
+impalad_basedir = \
+    os.path.realpath(os.path.join(impala_home, 'be/build', build_type_dir)).rstrip('/')
+
+IMPALAD_PATH = os.path.join(impalad_basedir, 'service', 'impalad')
+
+
+class SpecificImpaladBuildTypes:
+  """
+  Represent a specific build type. In reality, there 5 specific build types. These
+  specific build types are needed by Python test code.
+
+  The specific build types and their *most distinguishing* compiler options are:
+
+  1. ADDRESS_SANITIZER (clang -fsanitize=address)
+  2. DEBUG (gcc -ggdb)
+  3. DEBUG_CODE_COVERAGE (gcc -ggdb -ftest-coverage)
+  4. RELEASE (gcc)
+  5. RELEASE_CODE_COVERAGE (gcc -ftest-coverage)
+  """
+  # ./buildall.sh -asan
+  ADDRESS_SANITIZER = 'address_sanitizer'
+  # ./buildall.sh
+  DEBUG = 'debug'
+  # ./buildall.sh -codecoverage
+  DEBUG_CODE_COVERAGE = 'debug_code_coverage'
+  # ./buildall.sh -release
+  RELEASE = 'release'
+  # ./buildall.sh -release -codecoverage
+  RELEASE_CODE_COVERAGE = 'release_code_coverage'
+
+
+class ImpaladBuild(object):
+  """
+  Acquires and provides characteristics about the way the Impala under test was compiled
+  and its likely effects on its responsiveness to automated test timings.
+  """
+  def __init__(self, impalad_path):
+    self.impalad_path = impalad_path
+    die_name, die_producer = self._get_impalad_dwarf_info()
+    self._set_impalad_build_type(die_name, die_producer)
+
+  @property
+  def specific_build_type(self):
+    """
+    Return the correct SpecificImpaladBuildTypes for the Impala under test.
+    """
+    return self._specific_build_type
+
+  def has_code_coverage(self):
+    """
+    Return whether the Impala under test was compiled with code coverage enabled.
+    """
+    return self.specific_build_type in (SpecificImpaladBuildTypes.DEBUG_CODE_COVERAGE,
+                                        SpecificImpaladBuildTypes.RELEASE_CODE_COVERAGE)
+
+  def is_asan(self):
+    """
+    Return whether the Impala under test was compiled with ASAN.
+    """
+    return self.specific_build_type == SpecificImpaladBuildTypes.ADDRESS_SANITIZER
+
+  def is_dev(self):
+    """
+    Return whether the Impala under test is a development build (i.e., any debug or ASAN
+    build).
+    """
+    return self.specific_build_type in (
+        SpecificImpaladBuildTypes.ADDRESS_SANITIZER, SpecificImpaladBuildTypes.DEBUG,
+        SpecificImpaladBuildTypes.DEBUG_CODE_COVERAGE)
+
+  def runs_slowly(self):
+    """
+    Return whether the Impala under test "runs slowly". For our purposes this means
+    either compiled with code coverage enabled or ASAN.
+    """
+    return self.has_code_coverage() or self.is_asan()
+
+  def _get_impalad_dwarf_info(self):
+    """
+    Read the impalad_path ELF binary, which is supposed to contain DWARF, and read the
+    DWARF to understand the compiler options. Return a 2-tuple of the two useful DIE
+    attributes of the first compile unit: the DW_AT_name and DW_AT_producer. If
+    something goes wrong doing this, log a warning and return nothing.
+    """
+    # Some useful references:
+    # - be/CMakeLists.txt
+    # - gcc(1), especially -grecord-gcc-switches, -g, -ggdb, -gdwarf-2
+    # - readelf(1)
+    # - general reading about DWARF
+    # A useful command for exploration without having to wade through many bytes is:
+    # readelf --debug-dump=info --dwarf-depth=1 impalad
+    # The DWARF lines are long, raw, and nasty; I'm hesitant to paste them here, so
+    # curious readers are highly encouraged to try the above, or read IMPALA-3501.
+    die_name = None
+    die_producer = None
+    try:
+      with open(self.impalad_path, 'rb') as fh:
+        impalad_elf = ELFFile(fh)
+        if impalad_elf.has_dwarf_info():
+          dwarf_info = impalad_elf.get_dwarf_info()
+          # We only need the first CU, hence the unconventional use of the iterator
+          # protocol.
+          cu_iterator = dwarf_info.iter_CUs()
+          first_cu = next(cu_iterator)
+          top_die = first_cu.get_top_DIE()
+          die_name = top_die.attributes['DW_AT_name'].value
+          die_producer = top_die.attributes['DW_AT_producer'].value
+    except Exception as e:
+      LOG.warn('Failure to read DWARF info from {0}: {1}'.format(self.impalad_path,
+                                                                 str(e)))
+    return die_name, die_producer
+
+  def _set_impalad_build_type(self, die_name, die_producer):
+    """
+    Use a heuristic based on the DW_AT_producer and DW_AT_name of the first compile
+    unit, as returned by _get_impalad_dwarf_info(), to figure out which of 5 supported
+    builds of impalad we're dealing with. If the heuristic can't determine, fall back to
+    assuming a debug build and log a warning.
+    """
+    ASAN_CU_NAME = 'asan_preinit.cc'
+    NON_ASAN_CU_NAME = 'daemon-main.cc'
+    GDB_FLAG = '-ggdb'
+    CODE_COVERAGE_FLAG = '-ftest-coverage'
+
+    if die_name is None or die_producer is None:
+      LOG.warn('Not enough DWARF info in {0} to determine build type; choosing '
+               'DEBUG'.format(self.impalad_path))
+      self._specific_build_type = SpecificImpaladBuildTypes.DEBUG
+      return
+
+    is_debug = GDB_FLAG in die_producer
+    specific_build_type = SpecificImpaladBuildTypes.DEBUG
+
+    if die_name.endswith(ASAN_CU_NAME):
+      specific_build_type = SpecificImpaladBuildTypes.ADDRESS_SANITIZER
+    elif not die_name.endswith(NON_ASAN_CU_NAME):
+      LOG.warn('Unexpected DW_AT_name in first CU: {0}; choosing '
+               'DEBUG'.format(die_name))
+      specific_build_type = SpecificImpaladBuildTypes.DEBUG
+    elif CODE_COVERAGE_FLAG in die_producer:
+      if is_debug:
+        specific_build_type = SpecificImpaladBuildTypes.DEBUG_CODE_COVERAGE
+      else:
+        specific_build_type = SpecificImpaladBuildTypes.RELEASE_CODE_COVERAGE
+    else:
+      if is_debug:
+        specific_build_type = SpecificImpaladBuildTypes.DEBUG
+      else:
+        specific_build_type = SpecificImpaladBuildTypes.RELEASE
+
+    self._specific_build_type = specific_build_type
+
+
+IMPALAD_BUILD = ImpaladBuild(IMPALAD_PATH)
+
+
+def specific_build_type_timeout(
+    default_timeout, slow_build_timeout=None, asan_build_timeout=None,
+    code_coverage_build_timeout=None):
+  """
+  Return a test environment-specific timeout based on the sort of
+  SpecificImpalaBuildType under test.
+
+  Required parameter: default_timeout - default timeout value. This applies when Impala is
+  a standard release or debug build, or if no other timeouts are specified.
+
+  Optional parameters:
+  slow_build_timeout - timeout to use if we're running against *any* build known to be
+  slow. If specified, this will preempt default_timeout if Impala is expected to be
+  "slow". You can use this as a shorthand in lieu of specifying all of the following
+  parameters.
+
+  The parameters below correspond to specific build types. These preempt both
+  slow_build_timeout and default_timeout, if the Impala under test is a build of the
+  applicable type:
+
+  asan_build_timeout - timeout to use if Impala with ASAN is running
+
+  code_coverage_build_timeout - timeout to use if Impala with code coverage is running
+  (both debug and release code coverage)
+  """
+
+  if IMPALAD_BUILD.is_asan() and asan_build_timeout is not None:
+    timeout_val = asan_build_timeout
+  elif IMPALAD_BUILD.has_code_coverage() and code_coverage_build_timeout is not None:
+    timeout_val = code_coverage_build_timeout
+  elif IMPALAD_BUILD.runs_slowly() and slow_build_timeout is not None:
+    timeout_val = slow_build_timeout
+  else:
+    timeout_val = default_timeout
+  return timeout_val

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/22669e23/tests/common/skip.py
----------------------------------------------------------------------
diff --git a/tests/common/skip.py b/tests/common/skip.py
index b2f52ba..e260eae 100644
--- a/tests/common/skip.py
+++ b/tests/common/skip.py
@@ -21,6 +21,8 @@ import re
 import os
 import pytest
 from functools import partial
+
+from tests.common.environ import IMPALAD_BUILD, USING_OLD_AGGS_JOINS
 from tests.util.filesystem_utils import IS_DEFAULT_FS, IS_S3, IS_ISILON, IS_LOCAL
 
 
@@ -66,22 +68,14 @@ class SkipIfIsilon:
       reason="This Isilon issue has yet to be triaged.")
   jira = partial(pytest.mark.skipif, IS_ISILON)
 
-# TODO: looking at TEST_START_CLUSTER_ARGS is a hack. It would be better to add an option
-# to pytest.
-test_start_cluster_args = os.environ.get("TEST_START_CLUSTER_ARGS","")
-old_agg_regex = "enable_partitioned_aggregation=false"
-old_hash_join_regex = "enable_partitioned_hash_join=false"
-using_old_aggs_joins = re.search(old_agg_regex, test_start_cluster_args) is not None or \
-    re.search(old_hash_join_regex, test_start_cluster_args) is not None
-
 class SkipIfOldAggsJoins:
-  nested_types = pytest.mark.skipif(using_old_aggs_joins,
+  nested_types = pytest.mark.skipif(USING_OLD_AGGS_JOINS,
       reason="Nested types not supported with old aggs and joins")
-  passthrough_preagg = pytest.mark.skipif(using_old_aggs_joins,
+  passthrough_preagg = pytest.mark.skipif(USING_OLD_AGGS_JOINS,
       reason="Passthrough optimization not implemented by old agg")
-  unsupported = pytest.mark.skipif(using_old_aggs_joins,
+  unsupported = pytest.mark.skipif(USING_OLD_AGGS_JOINS,
       reason="Query unsupported with old aggs and joins")
-  requires_spilling = pytest.mark.skipif(using_old_aggs_joins,
+  requires_spilling = pytest.mark.skipif(USING_OLD_AGGS_JOINS,
       reason="Test case requires spilling to pass")
 
 class SkipIfLocal:
@@ -111,21 +105,6 @@ class SkipIfLocal:
   root_path = pytest.mark.skipif(IS_LOCAL,
       reason="Tests rely on the root directory")
 
-# Try to derive the build type. Assume it's 'latest' by default.
-impala_home = os.environ.get("IMPALA_HOME", "")
-build_type_regex = re.compile(r'--build_type=(\w+)', re.I)
-build_type_search_result = re.search(build_type_regex, test_start_cluster_args)
-
-if build_type_search_result is not None:
-  build_type = build_type_search_result.groups()[0].lower()
-else:
-  build_type = 'latest'
-
-# Resolve any symlinks in the path.
-impalad_basedir = \
-    os.path.realpath(os.path.join(impala_home, 'be/build', build_type)).rstrip('/')
-debug_build = os.path.basename(impalad_basedir).lower() == 'debug'
-
-class SkipIfNotDebugBuild:
-  debug_only = pytest.mark.skipif(not debug_build,
+class SkipIfBuildType:
+  not_dev_build = pytest.mark.skipif(not IMPALAD_BUILD.is_dev(),
       reason="Tests depends on debug build startup option.")

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/22669e23/tests/custom_cluster/test_alloc_fail.py
----------------------------------------------------------------------
diff --git a/tests/custom_cluster/test_alloc_fail.py b/tests/custom_cluster/test_alloc_fail.py
index d9b9f78..9021089 100644
--- a/tests/custom_cluster/test_alloc_fail.py
+++ b/tests/custom_cluster/test_alloc_fail.py
@@ -16,9 +16,9 @@ import logging
 import pytest
 from copy import deepcopy
 from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
-from tests.common.skip import SkipIfNotDebugBuild
+from tests.common.skip import SkipIfBuildType
 
-@SkipIfNotDebugBuild.debug_only
+@SkipIfBuildType.not_dev_build
 class TestAllocFail(CustomClusterTestSuite):
   """Tests for handling malloc() failure for UDF/UDA"""
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/22669e23/tests/custom_cluster/test_exchange_delays.py
----------------------------------------------------------------------
diff --git a/tests/custom_cluster/test_exchange_delays.py b/tests/custom_cluster/test_exchange_delays.py
index 9c96801..147c956 100644
--- a/tests/custom_cluster/test_exchange_delays.py
+++ b/tests/custom_cluster/test_exchange_delays.py
@@ -14,9 +14,9 @@
 
 import pytest
 from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
-from tests.common.skip import SkipIfNotDebugBuild
+from tests.common.skip import SkipIfBuildType
 
-@SkipIfNotDebugBuild.debug_only
+@SkipIfBuildType.not_dev_build
 class TestExchangeDelays(CustomClusterTestSuite):
   """Tests for handling delays in finding data stream receivers"""
 

http://git-wip-us.apache.org/repos/asf/incubator-impala/blob/22669e23/tests/statestore/test_statestore.py
----------------------------------------------------------------------
diff --git a/tests/statestore/test_statestore.py b/tests/statestore/test_statestore.py
index c1724b0..49a8bc1 100644
--- a/tests/statestore/test_statestore.py
+++ b/tests/statestore/test_statestore.py
@@ -21,6 +21,8 @@ import uuid
 import urllib2
 import json
 
+from tests.common.environ import specific_build_type_timeout
+
 from thrift.transport import TSocket
 from thrift.transport import TTransport
 from thrift.protocol import TBinaryProtocol
@@ -59,6 +61,9 @@ STATUS_OK = TStatus(TErrorCode.OK)
 DEFAULT_UPDATE_STATE_RESPONSE = TUpdateStateResponse(status=STATUS_OK, topic_updates=[],
                                                      skipped=False)
 
+# IMPALA-3501: the timeout needs to be higher in code coverage builds
+WAIT_FOR_FAILURE_TIMEOUT = specific_build_type_timeout(40, code_coverage_build_timeout=60)
+
 class WildcardServerSocket(TSocket.TSocketBase, TTransport.TServerTransportBase):
   """Specialised server socket that binds to a random port at construction"""
   def __init__(self, host=None, port=0):
@@ -292,7 +297,8 @@ class StatestoreSubscriber(object):
     finally:
       self.update_event.release()
 
-  def wait_for_failure(self, timeout=40):
+
+  def wait_for_failure(self, timeout=WAIT_FOR_FAILURE_TIMEOUT):
     """Waits until this subscriber no longer appears in the statestore's subscriber
     list. If 'timeout' seconds pass, throws an exception."""
     start = time.time()


Mime
View raw message