impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Impala Public Jenkins (Code Review)" <>
Subject [Impala-ASF-CR] IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.
Date Sat, 18 Apr 2020 05:13:50 GMT
Impala Public Jenkins has submitted this change and it was merged. (

Change subject: IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

IMPALA-3343, IMPALA-9489: Make impala-shell compatible with python 3.

This is the main patch for making the the impala-shell cross-compatible with
python 2 and python 3. The goal is wind up with a version of the shell that will
pass python e2e tests irrepsective of the version of python used to launch the
shell, under the assumption that the test framework itself will continue to run
with python 2.7.x for the time being.

Notable changes for reviewers to consider:

- With regard to validating the patch, my assumption is that simply passing
  the existing set of e2e shell tests is sufficient to confirm that the shell
  is functioning properly. No new tests were added.

- A new pytest command line option was added in to enable a user
  to specify a path to an alternate impala-shell executable to test. It's
  possible to use this to point to an instance of the impala-shell that was
  installed as a standalone python package in a separate virtualenv.

  Example usage:
  USE_THRIFT11_GEN_PY=true impala-py.test --shell_executable=/<path to virtualenv>/bin/impala-shell
-sv shell/

  The target virtualenv may be based on either python3 or python2. However,
  this has no effect on the version of python used to run the test framework,
  which remains tied to python 2.7.x for the foreseeable future.

- The $IMPALA_HOME/bin/ now sets up the impala-shell python
  environment independenty from bin/ The default version
  of thrift is thrift-0.11.0 (See IMPALA-9489).

- The wording of the header changed a bit to include the python version
  used to run the shell.

    Starting Impala Shell with no authentication using Python 3.7.5
    Opened TCP connection to localhost:21000


    Starting Impala Shell with LDAP-based authentication using Python 2.7.12
    Opened TCP connection to localhost:21000

- By far, the biggest hassle has been juggling str versus unicode versus
  bytes data types. Python 2.x was fairly loose and inconsistent in
  how it dealt with strings. As a quick demo of what I mean:

  Python 2.7.12 (default, Nov 12 2018, 14:36:49)
  [GCC 5.4.0 20160609] on linux2
  Type "help", "copyright", "credits" or "license" for more information.
  >>> d = 'like a duck'
  >>> d == str(d) == bytes(d) == unicode(d) == d.encode('utf-8') == d.decode('utf-8')

  ...and yet there are weird unexpected gotchas.

  >>> d.decode('utf-8') == d.encode('utf-8')
  >>> d.encode('utf-8') == bytearray(d, 'utf-8')
  >>> d.decode('utf-8') == bytearray(d, 'utf-8')   # fails the eq property?

  As a result, this was inconsistency was reflected in the way we handled
  strings in the impala-shell code, but things still just worked.

  In python3, there's a much clearer distinction between strings and bytes, and
  as such, much tighter type consistency is expected by standard libs like
  subprocess, re, sqlparse, prettytable, etc., which are used throughout the
  shell. Even simple calls that worked in python 2.x:

  >>> import re
  >>> re.findall('foo', b'foobar')

  ...can throw exceptions in python 3.x:

  >>> import re
  >>> re.findall('foo', b'foobar')
  Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/data0/systest/venvs/py3/lib/python3.7/", line 223, in findall
      return _compile(pattern, flags).findall(string)
  TypeError: cannot use a string pattern on a bytes-like object

  Exceptions like this resulted in a many, if not most shell tests failing
  under python 3.

  What ultimately seemed like a better approach was to try to weed out as many
  existing spurious str.encode() and str.decode() calls as I could, and try to
  implement what is has colloquially been called a "unicode sandwich" -- namely,
  "bytes on the outside, unicode on the inside, encode/decode at the edges."

  The primary spot in the shell where we call decode() now is when sanitising

  args = self.sanitise_input(args.decode('utf-8'))

  ...and also whenever a library like re required it. Similarly, str.encode()
  is primarily used where a library like readline or csv requires is.

- PYTHONIOENCODING needs to be set to utf-8 to override the default setting for
  python 2. Without this, piping or redirecting stdout results in unicode errors.

- from __future__ import unicode_literals was added throughout


  To test the changes, I ran the e2e shell tests the way we always do (against
  the normal build tarball), and then I set up a python 3 virtual env with the
  shell installed as a package, and manually ran the tests against that.

  No effort has been made at this point to come up with a way to integrate
  testing of the shell in a python3 environment into our automated test

Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615
Reviewed-by: Impala Public Jenkins <>
Tested-by: Impala Public Jenkins <>
M bin/
M bin/
M shell/
M shell/
M shell/impala-shell
M shell/
M shell/
M shell/
M shell/
M shell/
M shell/packaging/
M shell/packaging/requirements.txt
M shell/packaging/
M shell/
M shell/
M tests/
M tests/shell/
M tests/shell/
M tests/shell/
19 files changed, 352 insertions(+), 149 deletions(-)

  Impala Public Jenkins: Looks good to me, approved; Verified

To view, visit
To unsubscribe, visit

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: Idb004d352fe230a890a6b6356496ba76c2fab615
Gerrit-Change-Number: 15524
Gerrit-PatchSet: 11
Gerrit-Owner: David Knupp <>
Gerrit-Reviewer: Abhishek Rawat <>
Gerrit-Reviewer: David Knupp <>
Gerrit-Reviewer: Impala Public Jenkins <>
Gerrit-Reviewer: Sahil Takiar <>
Gerrit-Reviewer: Tim Armstrong <>

  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message