Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id C7522200BF3 for ; Thu, 5 Jan 2017 18:00:16 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id C5E61160B33; Thu, 5 Jan 2017 17:00:16 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 1C39B160B26 for ; Thu, 5 Jan 2017 18:00:15 +0100 (CET) Received: (qmail 57657 invoked by uid 500); 5 Jan 2017 17:00:14 -0000 Mailing-List: contact reviews-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@impala.incubator.apache.org Received: (qmail 57646 invoked by uid 99); 5 Jan 2017 17:00:13 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 05 Jan 2017 17:00:13 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id C4037C0258 for ; Thu, 5 Jan 2017 17:00:12 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.363 X-Spam-Level: X-Spam-Status: No, score=0.363 tagged_above=-999 required=6.31 tests=[RDNS_DYNAMIC=0.363, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id w-k07fw_Qd-O for ; Thu, 5 Jan 2017 17:00:12 +0000 (UTC) Received: from ip-10-146-233-104.ec2.internal (ec2-75-101-130-251.compute-1.amazonaws.com [75.101.130.251]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 49F655F252 for ; Thu, 5 Jan 2017 17:00:11 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by ip-10-146-233-104.ec2.internal (8.14.4/8.14.4) with ESMTP id v05GxrOJ007952; Thu, 5 Jan 2017 16:59:53 GMT Message-Id: <201701051659.v05GxrOJ007952@ip-10-146-233-104.ec2.internal> Date: Thu, 5 Jan 2017 16:59:53 +0000 From: "Attila Jeges (Code Review)" To: impala-cr@cloudera.com, reviews@impala.incubator.apache.org Reply-To: attilaj@cloudera.com X-Gerrit-MessageType: newpatchset Subject: =?UTF-8?Q?=5BImpala-ASF-CR=5D_IMPALA-3989=3A_Display_skew_warning_for_poorly_formatted_Parquet_files=0A?= X-Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a X-Gerrit-ChangeURL: X-Gerrit-Commit: c5a121c24d15de4767cbff8adbda56104d3a15de In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Content-Disposition: inline User-Agent: Gerrit/2.12.2 archived-at: Thu, 05 Jan 2017 17:00:17 -0000 Attila Jeges has uploaded a new patch set (#8). Change subject: IMPALA-3989: Display skew warning for poorly formatted Parquet files ...................................................................... IMPALA-3989: Display skew warning for poorly formatted Parquet files Parquet files are scanned in the granularity of row groups. Each row group belongs to one or more splits and each split is scanned by a separate parquet scanner. If some row groups span multiple splits, then we will most likely end up seeing some scanners having remote reads and some scanners not performing scans at all. This will attribute to skew across the cluster where distribution of scans is uneven. This change adds a counter (NumScannersWithNoReads) to the scan node's runtime profile to track the number of parquet scanners that end up doing no reads becuse of poor formatting. It also displays a warning message when a misaligned row group is found. Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a --- M be/src/exec/hdfs-parquet-scanner.cc M be/src/exec/hdfs-parquet-scanner.h M common/thrift/generate_error_codes.py M tests/query_test/test_scanners.py 4 files changed, 133 insertions(+), 9 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/5400/8 -- To view, visit http://gerrit.cloudera.org:8080/5400 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ibf48d978383d73efdade733a892e795ebd53c76a Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Attila Jeges Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Sailesh Mukil Gerrit-Reviewer: Thomas Tauber-Marshall