Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EDF1D200CB7 for ; Fri, 16 Jun 2017 02:30:33 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id EC875160BED; Fri, 16 Jun 2017 00:30:33 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 3CD24160BDF for ; Fri, 16 Jun 2017 02:30:33 +0200 (CEST) Received: (qmail 733 invoked by uid 500); 16 Jun 2017 00:30:32 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 722 invoked by uid 99); 16 Jun 2017 00:30:32 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 16 Jun 2017 00:30:32 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id B6DBFC00A9 for ; Fri, 16 Jun 2017 00:30:31 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.362 X-Spam-Level: X-Spam-Status: No, score=0.362 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id I488hcSuumva for ; Fri, 16 Jun 2017 00:30:30 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 14E335FBEA for ; Fri, 16 Jun 2017 00:30:29 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v5G0USR2011221; Fri, 16 Jun 2017 00:30:28 GMT Message-Id: <201706160030.v5G0USR2011221@ip-10-146-233-104.ec2.internal> Date: Fri, 16 Jun 2017 00:30:28 +0000 From: "Taras Bobrovytsky (Code Review)" To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org Reply-To: tbobrovytsky@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_Add_nested_testdata_flattener=0A?= X-Gerrit-Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6 X-Gerrit-ChangeURL: X-Gerrit-Commit: 38f2a434c1a5b67d1f656e60e8308e06a948f0a3 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.7 archived-at: Fri, 16 Jun 2017 00:30:34 -0000 Taras Bobrovytsky has uploaded a new patch set (#2). Change subject: Add nested testdata flattener ...................................................................... Add nested testdata flattener The TableFlattener takes a nested dataset and creates an equivalent unnested dataset. The unnested dataset is saved as Parquet. When an array or map is encountered in the original table, the flattener creates a new table and adds an id column to it which references the row in the parent table. Joining on the id column should produce the original dataset. The flattened dataset should be loaded into Postgres in order to run the query generator (in nested types mode) on it. There is a script that automates generaration, flattening and loading random data into Postgres and Impala: testdata/bin/generate-load-nested.sh -f Testing: - ran ./testdata/bin/generate-load-nested.sh -f and random nested data was generated and flattened as expected. Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6 --- A testdata/TableFlattener/.gitignore A testdata/TableFlattener/README A testdata/TableFlattener/pom.xml A testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/FileMigrator.java A testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/FlattenedSchema.java A testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/Main.java A testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/SchemaFlattener.java A testdata/TableFlattener/src/main/java/org/apache/impala/infra/tableflattener/SchemaUtil.java M testdata/bin/generate-load-nested.sh 9 files changed, 863 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/87/5787/2 -- To view, visit http://gerrit.cloudera.org:8080/5787 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7e7a8e53ada9274759a3e2128b97bec292c129c6 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Taras Bobrovytsky Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Taras Bobrovytsky