Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D5341200B82 for ; Fri, 16 Sep 2016 18:27:29 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id D3C07160AC4; Fri, 16 Sep 2016 16:27:29 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F39FF160AB7 for ; Fri, 16 Sep 2016 18:27:28 +0200 (CEST) Received: (qmail 75960 invoked by uid 500); 16 Sep 2016 16:27:28 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 75932 invoked by uid 99); 16 Sep 2016 16:27:27 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Sep 2016 16:27:27 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 85A8DC1538 for ; Fri, 16 Sep 2016 16:27:27 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id mHEIofyP2M_2 for ; Fri, 16 Sep 2016 16:27:24 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id 43A695F22E for ; Fri, 16 Sep 2016 16:27:24 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id u8GGRNNf024364; Fri, 16 Sep 2016 16:27:23 GMT Message-Id: <201609161627.u8GGRNNf024364@ip-10-146-233-104.ec2.internal> Date: Fri, 16 Sep 2016 16:27:23 +0000 From: "Matthew Jacobs (Code Review)" To: Dimitris Tsirogiannis , impala-cr@cloudera.com, reviews@impala.incubator.apache.org CC: Alex Behm Reply-To: mj@cloudera.com X-Gerrit-MessageType: comment Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-3719=3A_Simplify_CREATE_TABLE_statements_with_Kudu_tables=0A?= X-Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 X-Gerrit-ChangeURL: X-Gerrit-Commit: fed2b7662ccfd9d34d3fc6c7ad1f6a660711e8c2 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Fri, 16 Sep 2016 16:27:30 -0000 Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables ...................................................................... Patch Set 1: (17 comments) another batch of comments, still a lot of files i haven't touched yet... http://gerrit.cloudera.org:8080/#/c/4414/1/be/src/catalog/catalog.cc File be/src/catalog/catalog.cc: PS1, Line 47: {"", "(ZILjava/lang/String;IIZLjava/lang/String;)V", : &catalog_ctor_}, 1 line http://gerrit.cloudera.org:8080/#/c/4414/1/be/src/service/frontend.cc File be/src/service/frontend.cc: PS1, Line 37: // XXX: This flag doesn't seem to be used anywhere. Maybe remove it? : DEFINE_bool(load_catalog_at_startup, false, "if true, load all catalog data at startup"); agreed. doesn't look like it's used by JniFrontend. Let's remove it. PS1, Line 63: IPs IP addresses. PS1, Line 63: no space http://gerrit.cloudera.org:8080/#/c/4414/1/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: PS1, Line 52: THdfsFileFormat Maybe not for this patch since it's already huge, but it'd be great to generalize this if we can. I can think of two improvements: 1) Maybe we should model the storage layer, e.g. have a TStorageEngine, then make this TFileFormat (perhaps). This is probably a big change. 2) Rename this to be TStorageFormat, which kind of addresses #1 but doesn't separate out storage engines and file formats. PS1, Line 61: THdfsCompression similarly, this seems unnecessarily specific to Hdfs. Not necessarily something to change now but maybe we can create a follow-up JIRA to clean this up. http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS1, Line 31: ColumnDefOptions this class doesn't exist, please add the missing file http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java File fe/src/main/java/com/cloudera/impala/analysis/DistributeParam.java: PS1, Line 84: BigDecimal why BigDecimal? Ultimately this has to resolve to some int for kudu's API. (We can check if it's 32 or 64bit). PS1, Line 110: <= 1) is 1 bucket actually not valid? PS1, Line 138: colType.isStringType() && !exprType.isStringType() : || colType.isIntegerType() && (!exprType.isIntegerType() : || exprType.getPrecision() > colType.getPrecision()) 1. I don't see anything in the Kudu client that explicitly says you can't partition on any particular types. This code will exclude boolean and floating pt types, which is maybe unnecessary. 2. Esp if we can address #1, is there a cleaner way to make sure the types are valid rather than enumerating the kinds of types to consider? I'm not sure, but maybe one of the frontend gurus can think of something. http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/HdfsFileFormat.java File fe/src/main/java/com/cloudera/impala/catalog/HdfsFileFormat.java: PS1, Line 39: HdfsFileFormat Can you open a JIRA (and leave it in the comment) to refactor this later? http://gerrit.cloudera.org:8080/#/c/4414/1/fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java File fe/src/main/java/com/cloudera/impala/catalog/KuduTable.java: PS1, Line 79: : // TODO we should have something like KuduConfig.getDefaultConfig() do you know what this means / can it be removed now that we're adding default master addrs? Line 92: public static final String KEY_DISTRIBUTE_BY = "kudu.distribute_by"; comment Line 153: * Load the columns from the schema list can you add a comment about error handling in this function? PS1, Line 158: LOG.error(String.format("Kudu tables must have at least one" : + "key column (had %d), and no more key columns than there are table columns " : + "(had %d).", keyColumns.size(), schema.size())); shouldn't this still fail? if not, can you add a comment why this continues? PS1, Line 184: LOG.error(String.format("Some key columns were not found in" : + " the set of columns. List of column names: %s, List of key column names:" : + " %s", Iterables.toString(columnNames), Iterables.toString(keyColumns))); why do we continue? PS1, Line 199: // Get the table metadata from Kudu : if (reuseMetadata) { I'm confused about this. It's not clear to me from the name 'reuseMetadata' why this means we should populate the metadata from Kudu. If anything, it sounds like it would be the opposite. The base class comment just says "If 'reuseMetadata' is true, reuse valid existing metadata.". -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes