impala-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joe McDonnell (Code Review)" <ger...@cloudera.org>
Subject [Impala-ASF-CR] PREVIEW: IMPALA-6052: Change HDFS layout for test tables
Date Fri, 29 Dec 2017 23:24:10 GMT
Joe McDonnell has posted comments on this change. ( http://gerrit.cloudera.org:8080/8260 )

Change subject: PREVIEW: IMPALA-6052: Change HDFS layout for test tables
......................................................................


Patch Set 3:

(1 comment)

> (2 comments)
 > 
 > The direction here seems fine to me. This is a case where I think
 > you'll need to run both exhaustive and S3 tests before commit,
 > since this is so very cross-cutting.
 > 
 > What's the bigger picture of what brought you here?

Ran exhaustive. Dataload on S3 relies on snapshots, so it is not easy to test.

Partly this is just a cleanup for cleanup's sake. I don't like listing HDFS and finding hundreds
of directories with no structure. I think developers will be better off whenever they need
to look at the HDFS files.

A separate motivation is that the inconsistency has created problems when loading data on
a remote cluster. The current way of loading data on a remote cluster is to copy the HDFS
data over and then recreate the tables and metadata. However, if dataload can't tell that
a table is already populated (by looking at the disk usage of directories on HDFS), then it
will try to do inserts or loads as well as the create table statements. IMPALA-6068 had to
be reverted because dataload couldn't tell that a table was already populated, and it tried
to do a LOAD DATA LOCAL statement, which can't work on a remote cluster. This creates the
consistency needed to accurately detect whether a table is populated.

http://gerrit.cloudera.org:8080/#/c/8260/3/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/8260/3/testdata/bin/generate-schema-statements.py@473
PS3, Line 473:   if p.returncode != 0:
             :     print "eval_section command failed: {0}".format(cmd)
             :     assert(False)
> nit: I think this is equivalent to:
Done



-- 
To view, visit http://gerrit.cloudera.org:8080/8260
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3ba27ba6d3c7e445795e750281070963bbe1bb51
Gerrit-Change-Number: 8260
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Joe McDonnell <joemcdonnell@cloudera.com>
Gerrit-Reviewer: Philip Zeyliger <philip@cloudera.com>
Gerrit-Comment-Date: Fri, 29 Dec 2017 23:24:10 +0000
Gerrit-HasComments: Yes

Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message