Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EE166200C39 for ; Thu, 16 Mar 2017 23:52:46 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id EC9C7160B8E; Thu, 16 Mar 2017 22:52:46 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 402CD160B72 for ; Thu, 16 Mar 2017 23:52:46 +0100 (CET) Received: (qmail 47609 invoked by uid 500); 16 Mar 2017 22:52:45 -0000 Mailing-List: contact issues-help@impala.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@impala.incubator.apache.org Delivered-To: mailing list issues@impala.incubator.apache.org Received: (qmail 47599 invoked by uid 99); 16 Mar 2017 22:52:45 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Mar 2017 22:52:45 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id EC2AC1A0738 for ; Thu, 16 Mar 2017 22:52:44 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 0.651 X-Spam-Level: X-Spam-Status: No, score=0.651 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id JuZts_ZvY_YG for ; Thu, 16 Mar 2017 22:52:44 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id CCF3F60FE2 for ; Thu, 16 Mar 2017 22:52:43 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id AC1FDE0BB3 for ; Thu, 16 Mar 2017 22:52:42 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1208D254C7 for ; Thu, 16 Mar 2017 22:52:42 +0000 (UTC) Date: Thu, 16 Mar 2017 22:52:42 +0000 (UTC) From: "Tim Armstrong (JIRA)" To: issues@impala.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (IMPALA-5084) Backend support for large rows in Sorter MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 16 Mar 2017 22:52:47 -0000 [ https://issues.apache.org/jira/browse/IMPALA-5084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-5084: ---------------------------------- Description: See IMPALA-3208 for the context. Sorter::Run changes: * We can use a similar approach to that used for BufferedTupleStream as described in IMPALA-5085 Testing: Needs end-to-end tests exercising all operators with large operators was: We need to ensure that all exec nodes can support rows larger than the default page size. The default page size will be a query option, so users can always increase that, however minimum memory requirements will scale proportionally, which makes this less appealing. We should also add a max_row_size query option that controls the maximum size of rows supported by operators (at least those that use the reservation mechanism). We should be able to support large rows with only a single read and write buffer of the max row size. I.e. the minimum requirement for an operator would be ((min_buffers -2) * default_buffer_size) + 2 * max_row_size. This requires the following changes to the operators: BufferedTupleStream changes: * Rows <= the default page size are written as before * Rows that don't fit in the default page size get written into a larger page, with one row per page. * Upon writing a large row to an unpinned stream, the page is immediately unpinned and we immediately advance to the next write page, so that the large page is not kept pinned outside of the AddRow() call. * We should only be reading from one unpinned stream at a time, so only one large page is required there. Sorter::Run changes: * A similar approach to the above can be used. Testing: Needs end-to-end tests exercising all operators with large operators > Backend support for large rows in Sorter > ---------------------------------------- > > Key: IMPALA-5084 > URL: https://issues.apache.org/jira/browse/IMPALA-5084 > Project: IMPALA > Issue Type: Sub-task > Components: Backend > Affects Versions: Impala 2.6.0 > Reporter: Tim Armstrong > Assignee: Tim Armstrong > Priority: Minor > Labels: resource-management > > See IMPALA-3208 for the context. > Sorter::Run changes: > * We can use a similar approach to that used for BufferedTupleStream as described in IMPALA-5085 > Testing: > Needs end-to-end tests exercising all operators with large operators -- This message was sent by Atlassian JIRA (v6.3.15#6346)