From issues-return-115418-archive-asf-public=cust-asf.ponee.io@hive.apache.org Fri Apr 27 22:18:05 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 8EEFA180608 for ; Fri, 27 Apr 2018 22:18:04 +0200 (CEST) Received: (qmail 70226 invoked by uid 500); 27 Apr 2018 20:18:03 -0000 Mailing-List: contact issues-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list issues@hive.apache.org Received: (qmail 70216 invoked by uid 99); 27 Apr 2018 20:18:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 27 Apr 2018 20:18:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 36829C0024 for ; Fri, 27 Apr 2018 20:18:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.511 X-Spam-Level: X-Spam-Status: No, score=-109.511 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id PI_bZ8attlQi for ; Fri, 27 Apr 2018 20:18:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 472D45F5E0 for ; Fri, 27 Apr 2018 20:18:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id C4DDEE1281 for ; Fri, 27 Apr 2018 20:18:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 2E7C224267 for ; Fri, 27 Apr 2018 20:18:00 +0000 (UTC) Date: Fri, 27 Apr 2018 20:18:00 +0000 (UTC) From: "Prasanth Jayachandran (JIRA)" To: issues@hive.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-19206) Automatic memory management for open streaming writers MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-19206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16456992#comment-16456992 ] Prasanth Jayachandran commented on HIVE-19206: ---------------------------------------------- - Added config to disable auto flush (mainly for testing) - minor fixes > Automatic memory management for open streaming writers > ------------------------------------------------------ > > Key: HIVE-19206 > URL: https://issues.apache.org/jira/browse/HIVE-19206 > Project: Hive > Issue Type: Sub-task > Components: Streaming > Affects Versions: 3.0.0, 3.1.0 > Reporter: Prasanth Jayachandran > Assignee: Prasanth Jayachandran > Priority: Major > Attachments: HIVE-19206.1.patch, HIVE-19206.2.patch > > > Problem: > When there are 100s of record updaters open, the amount of memory required by orc writers keeps growing because of ORC's internal buffers. This can lead to potential high GC or OOM during streaming ingest. > Solution: > The high level idea is for the streaming connection to remember all the open record updaters and flush the record updater periodically (at some interval). Records written to each record updater can be used as a metric to determine the candidate record updaters for flushing. > If stripe size of orc file is 64MB, the default memory management check happens only after every 5000 rows which may which may be too late when there are too many concurrent writers in a process. Example case would be 100 writers open and each of them have almost full stripe of 64MB buffered data, this would take 100*64MB ~=6GB of memory. When all of the record writers flush, the memory usage drops down to 100*~2MB which is just ~200MB memory usage. -- This message was sent by Atlassian JIRA (v7.6.3#76005)