Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 4CB3F200C0F for ; Thu, 2 Feb 2017 23:45:11 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 4B533160B65; Thu, 2 Feb 2017 22:45:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7C2A5160B57 for ; Thu, 2 Feb 2017 23:45:10 +0100 (CET) Received: (qmail 28867 invoked by uid 500); 2 Feb 2017 22:45:09 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 28857 invoked by uid 99); 2 Feb 2017 22:45:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Feb 2017 22:45:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 4284AC1420 for ; Thu, 2 Feb 2017 22:45:09 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.199 X-Spam-Level: X-Spam-Status: No, score=-1.199 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id 0K6P0c1vgIWd for ; Thu, 2 Feb 2017 22:45:05 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id B47EA5F30B for ; Thu, 2 Feb 2017 22:44:58 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 07F15E05AB for ; Thu, 2 Feb 2017 22:44:57 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id E8F85252B4 for ; Thu, 2 Feb 2017 22:44:55 +0000 (UTC) Date: Thu, 2 Feb 2017 22:44:55 +0000 (UTC) From: "ASF GitHub Bot (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (DRILL-5080) Create a memory-managed version of the External Sort operator MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 02 Feb 2017 22:45:11 -0000 [ https://issues.apache.org/jira/browse/DRILL-5080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15850666#comment-15850666 ] ASF GitHub Bot commented on DRILL-5080: --------------------------------------- Github user paul-rogers commented on a diff in the pull request: https://github.com/apache/drill/pull/717#discussion_r99067830 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/managed/ExternalSortBatch.java --- @@ -0,0 +1,1321 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.impl.xsort.managed; + +import java.io.IOException; +import java.util.Collection; +import java.util.LinkedList; +import java.util.List; + +import org.apache.drill.common.AutoCloseables; +import org.apache.drill.common.config.DrillConfig; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.ExecConstants; +import org.apache.drill.exec.exception.OutOfMemoryException; +import org.apache.drill.exec.exception.SchemaChangeException; +import org.apache.drill.exec.memory.BufferAllocator; +import org.apache.drill.exec.ops.FragmentContext; +import org.apache.drill.exec.ops.MetricDef; +import org.apache.drill.exec.physical.config.ExternalSort; +import org.apache.drill.exec.physical.impl.sort.RecordBatchData; +import org.apache.drill.exec.physical.impl.xsort.MSortTemplate; +import org.apache.drill.exec.physical.impl.xsort.SingleBatchSorter; +import org.apache.drill.exec.physical.impl.xsort.managed.BatchGroup.InputBatch; +import org.apache.drill.exec.record.AbstractRecordBatch; +import org.apache.drill.exec.record.BatchSchema; +import org.apache.drill.exec.record.BatchSchema.SelectionVectorMode; +import org.apache.drill.exec.record.RecordBatch; +import org.apache.drill.exec.record.SchemaUtil; +import org.apache.drill.exec.record.VectorContainer; +import org.apache.drill.exec.record.VectorWrapper; +import org.apache.drill.exec.record.WritableBatch; +import org.apache.drill.exec.record.selection.SelectionVector2; +import org.apache.drill.exec.record.selection.SelectionVector4; +import org.apache.drill.exec.testing.ControlsInjector; +import org.apache.drill.exec.testing.ControlsInjectorFactory; +import org.apache.drill.exec.vector.ValueVector; +import org.apache.drill.exec.vector.complex.AbstractContainerVector; + +import com.google.common.collect.Lists; + +/** + * External sort batch: a sort batch which can spill to disk in + * order to operate within a defined memory footprint. + *

+ *

Basic Operation

+ * The operator has three key phases: + *

+ *

    + *
  • The load phase in which batches are read from upstream.
  • + *
  • The merge phase in which spilled batches are combined to + * reduce the number of files below the configured limit. (Best + * practice is to configure the system to avoid this phase.) + *
  • The delivery phase in which batches are combined to produce + * the final output.
  • + *
+ * During the load phase: + *

+ *

    + *
  • The incoming (upstream) operator provides a series of batches.
  • + *
  • This operator sorts each batch, and accumulates them in an in-memory + * buffer.
  • + *
  • If the in-memory buffer becomes too large, this operator selects + * a subset of the buffered batches to spill.
  • + *
  • Each spill set is merged to create a new, sorted collection of + * batches, and each is spilled to disk.
  • + *
  • To allow the use of multiple disk storage, each spill group is written + * round-robin to a set of spill directories.
  • + *
+ *

+ * During the sort/merge phase: + *

+ *

    + *
  • When the input operator is complete, this operator merges the accumulated + * batches (which may be all in memory or partially on disk), and returns + * them to the output (downstream) operator in chunks of no more than + * 32K records.
  • + *
  • The final merge must combine a collection of in-memory and spilled + * batches. Several limits apply to the maximum "width" of this merge. For + * example, we each open spill run consumes a file handle, and we may wish --- End diff -- Fixed. > Create a memory-managed version of the External Sort operator > ------------------------------------------------------------- > > Key: DRILL-5080 > URL: https://issues.apache.org/jira/browse/DRILL-5080 > Project: Apache Drill > Issue Type: Improvement > Affects Versions: 1.8.0 > Reporter: Paul Rogers > Assignee: Paul Rogers > Fix For: 1.10.0 > > Attachments: ManagedExternalSortDesign.pdf > > > We propose to create a "managed" version of the external sort operator that works to a clearly-defined memory limit. Attached is a design specification for the work. > The project will include fixing a number of bugs related to the external sort, include as sub-tasks of this umbrella task. -- This message was sent by Atlassian JIRA (v6.3.15#6346)