Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E7732200BC8 for ; Wed, 9 Nov 2016 02:12:00 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id E6122160B0A; Wed, 9 Nov 2016 01:12:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E9B7D160B12 for ; Wed, 9 Nov 2016 02:11:59 +0100 (CET) Received: (qmail 64583 invoked by uid 500); 9 Nov 2016 01:11:59 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 64562 invoked by uid 99); 9 Nov 2016 01:11:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 Nov 2016 01:11:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id CB6202C1F5A for ; Wed, 9 Nov 2016 01:11:58 +0000 (UTC) Date: Wed, 9 Nov 2016 01:11:58 +0000 (UTC) From: "Paul Rogers (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (DRILL-5011) External Sort Batch memory use depends on record width MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 09 Nov 2016 01:12:01 -0000 [ https://issues.apache.org/jira/browse/DRILL-5011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15649214#comment-15649214 ] Paul Rogers edited comment on DRILL-5011 at 11/9/16 1:11 AM: ------------------------------------------------------------- Proposed solution: The best solution is to revise the "copier" to track actual memory use. But, copier is generated code so doing so is a big project. Short-term alternative: use a better record width estimate. ESB already maintains a running cumulative record count. Add a new count that is the in-memory count (total record count - spilled record count). The ESB allocator already maintains a counter of the total memory allocated to the ESB (which, by definition, must be for the in-memory batches.) Compute a better record width estimate as: {{record width estimate = allocated memory / in-memory record count}} In a simple experiment: {code} Original record width estimate: 324 Copier allocator after copy: 320000, Records: 809 (x10) Copier allocator after copy: 160256, Records: 809 (x11) Copier allocator after copy: 80384, Records: 809 (x11) Copier allocator after copy: 64000, Records: 809 (x many) Revised record width estimate: 114 Copier allocator after copy: 320000, Records: 2299 (x10) Copier allocator after copy: 221696, Records: 2299 (x many) {code} That is, with the original estimate, the copier did not make full use of the allocated memory and copied fewer records. With the revised estimate, the copier copied more records per batch and made better use of the allocated memory. In this case, the real records were smaller than the original estimate, wasting memory. In a query with larger records, the above estimate will prevent over-allocation of memory. was (Author: paul-rogers): Proposed solution: The best solution is to revise the "copier" to track actual memory use. But, copier is generated code so doing so is a big project. Short-term alternative: use a better record width estimate. ESB already maintains a running cumulative record count. Add a new count that is the in-memory count (total record count - spilled record count). The ESB allocator already maintains a counter of the total memory allocated to the ESB (which, by definition, must be for the in-memory batches.) Compute a better record width estimate as: {{record width estimate = allocated memory / in-memory record count}} In a simple experiment: {code} Original record width estimate: 324 Revised record width estimate: 114 {code} > External Sort Batch memory use depends on record width > ------------------------------------------------------ > > Key: DRILL-5011 > URL: https://issues.apache.org/jira/browse/DRILL-5011 > Project: Apache Drill > Issue Type: Bug > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Priority: Minor > > The ExternalSortBatch operator uses spill-to-disk to keep memory needs within a defined limit. However, the "copier" (really, the merge operation) can use an amount of memory determined not by the operator configuration, but by the width of each record. > The copier memory limit appears to be set by the COPIER_BATCH_MEM_LIMIT value. > However, the actual memory use is determined by the number of records that the copier is asked to copy. That record comes from an estimate of row width based on the type of each column. Note that the row width *is not* based on the actual data in each row. Varchar fields, for example, are assumed to be 40 characters wide. If the sorter is asked to sort records with Varchar fields of, say, 1000 characters, then the row width estimate will be a poor estimator of actual width. > Memory use is based on a > {code} > target record count = memory limit / estimate row width > {code} > Actual memory use is: > {code} > memory use = target row count * actual row width > {code} > Which is > {code} > memory use = memory limit * actual row width / estimate row width > {code} > That is, memory use depends on the ratio of actual to estimated width. If the estimate is off by 2, then we use twice as much memory as expected. > Not that the memory used for the copier defaults to 20 MB, so even an error of 4x still means only 80 MB of memory used; small in comparison to the many GB typically allocated to ESB storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)