Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 3A394200BB4 for ; Tue, 1 Nov 2016 17:38:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 38D92160B0A; Tue, 1 Nov 2016 16:38:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 8C312160ADA for ; Tue, 1 Nov 2016 17:37:59 +0100 (CET) Received: (qmail 69655 invoked by uid 500); 1 Nov 2016 16:37:58 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 69633 invoked by uid 99); 1 Nov 2016 16:37:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 01 Nov 2016 16:37:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 78B252C14F9 for ; Tue, 1 Nov 2016 16:37:58 +0000 (UTC) Date: Tue, 1 Nov 2016 16:37:58 +0000 (UTC) From: "Padma Penumarthy (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-4706) Fragment planning causes Drillbits to read remote chunks when local copies are available MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 01 Nov 2016 16:38:00 -0000 [ https://issues.apache.org/jira/browse/DRILL-4706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15625907#comment-15625907 ] Padma Penumarthy commented on DRILL-4706: ----------------------------------------- For the data mentioned in the description of the problem, 4 nodes have 16 files each, 3 nodes have 17 files and other 3 nodes have 15 files i.e. data is not distributed equally among all nodes. With soft affinity parallelizer, we are allocating 16 fragments on each node. So, the nodes which have only 15 parquet files locally are doing remote read from one of the fragments. 3 remote reads for the 3 rowGroups (512 MB *3 ~ 1.5G) explains 2% (of 70G) remote reads. With the local affinity parallelizer, we schedule 16 fragments on 4 nodes, 17 on 3 nodes and 15 on the other 3 nodes. There were no remote reads in this case. > Fragment planning causes Drillbits to read remote chunks when local copies are available > ---------------------------------------------------------------------------------------- > > Key: DRILL-4706 > URL: https://issues.apache.org/jira/browse/DRILL-4706 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 1.6.0 > Environment: CentOS, RHEL > Reporter: Kunal Khatua > Assignee: Sorabh Hamirwasia > Labels: performance, planning > > When a table (datasize=70GB) of 160 parquet files (each having a single rowgroup and fitting within one chunk) is available on a 10-node setup with replication=3 ; a pure data scan query causes about 2% of the data to be read remotely. > Even with the creation of metadata cache, the planner is selecting a sub-optimal plan of executing the SCAN fragments such that some of the data is served from a remote server. -- This message was sent by Atlassian JIRA (v6.3.4#6332)