Return-Path: X-Original-To: apmail-drill-dev-archive@www.apache.org Delivered-To: apmail-drill-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3D7CC184E3 for ; Sat, 1 Aug 2015 04:20:38 +0000 (UTC) Received: (qmail 86201 invoked by uid 500); 1 Aug 2015 04:20:38 -0000 Delivered-To: apmail-drill-dev-archive@drill.apache.org Received: (qmail 86153 invoked by uid 500); 1 Aug 2015 04:20:38 -0000 Mailing-List: contact dev-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list dev@drill.apache.org Received: (qmail 86141 invoked by uid 99); 1 Aug 2015 04:20:37 -0000 Received: from Unknown (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 01 Aug 2015 04:20:37 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 284DC196E5B for ; Sat, 1 Aug 2015 04:20:37 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.921 X-Spam-Level: ** X-Spam-Status: No, score=2.921 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, T_KAM_HTML_FONT_INVALID=0.01, T_REMOTE_IMAGE=0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=maprtech.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id F4zv6bLyciZD for ; Sat, 1 Aug 2015 04:20:25 +0000 (UTC) Received: from mail-yk0-f178.google.com (mail-yk0-f178.google.com [209.85.160.178]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 2F45020645 for ; Sat, 1 Aug 2015 04:20:24 +0000 (UTC) Received: by ykax123 with SMTP id x123so74509820yka.1 for ; Fri, 31 Jul 2015 21:19:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=maprtech.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=YHh5CI7L6ppC47YGicYCoyfLJMx+EB0SdHWWxtmnyWY=; b=JJRINE0N0pcJHqk6F0A0r+TINGarx6mO6EDOKv6ccL2uBvcjodJa16zzy6qlyY/YqW XoM5Km+JGjL6XnuPGdNsH0JbQrhgk7LbRzj2/9Ujlnhz0OzziLNvRNTML7xvZKDlHWiE 8zi6QUwB3yCXVkJTN2XLu8PxzMzpFW2K//OBQ= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=YHh5CI7L6ppC47YGicYCoyfLJMx+EB0SdHWWxtmnyWY=; b=T1fEfJeTuM653p3JBOwnXwEOvEsZpaCY1Y3bsvA3cvj9rOzcEXn5AcdGSByao1xKce DhGB9u4mg/f1+J9u84vDlEHVJ0ONmApCjy3mwKKX9180zkAgG8bg6PqOt3Un6VFu+fE3 PDPCWN8O9HCnTefTNXeIVE4y9ti1vO2MeyAUcmtd/DhOVSVF0mb0uClBTXFyfNXu7VDb OqdUHo2s0E015J0zD5oNdcllamO/lEdiKmgFeQqJoz1mg0IbETBd4hzKEefuUc3z8AAU Ns5dPBDvGt/w8bOu0X5yfZJFsAZvyMK59TPKU9LqwK64bFjfhh9ger7zoSD2yPNIG7SB ZNcg== X-Gm-Message-State: ALoCoQkbp0YZDrfk2C8zgetiYU4X+fQ9WgS5jF50FKnk4XWgHB06iaSBVffhsMEVfPo0JnWSDzNF MIME-Version: 1.0 X-Received: by 10.129.93.136 with SMTP id r130mr8020147ywb.52.1438402777960; Fri, 31 Jul 2015 21:19:37 -0700 (PDT) Received: by 10.129.103.5 with HTTP; Fri, 31 Jul 2015 21:19:37 -0700 (PDT) In-Reply-To: References: Date: Fri, 31 Jul 2015 21:19:37 -0700 Message-ID: Subject: Re: Suspicious direct memory consumption when running queries concurrently From: Abdel Hakim Deneche To: "dev@drill.apache.org" Content-Type: multipart/alternative; boundary=001a114d80faa00831051c383eae --001a114d80faa00831051c383eae Content-Type: text/plain; charset=UTF-8 I tried getting a jmap dump multiple times without success, each time it crashes the jvm with the following exception: Dumping heap to /home/mapr/private-sql-hadoop-test/framework/myfile.hprof > ... > Exception in thread "main" java.io.IOException: Premature EOF > at > sun.tools.attach.HotSpotVirtualMachine.readInt(HotSpotVirtualMachine.java:248) > at > sun.tools.attach.LinuxVirtualMachine.execute(LinuxVirtualMachine.java:199) > at > sun.tools.attach.HotSpotVirtualMachine.executeCommand(HotSpotVirtualMachine.java:217) > at > sun.tools.attach.HotSpotVirtualMachine.dumpHeap(HotSpotVirtualMachine.java:180) > at sun.tools.jmap.JMap.dump(JMap.java:242) > at sun.tools.jmap.JMap.main(JMap.java:140) On Mon, Jul 27, 2015 at 3:45 PM, Jacques Nadeau wrote: > A allocate -> release cycle all on the same thread goes into a per thread > cache. > > A bunch of Netty arena settings are configurable. The big issue I believe > is that the limits are soft limits implemented by the allocation-time > release mechanism. As such, if you allocate a bunch of memory, then > release it all, that won't necessarily trigger any actual chunk releases. > > -- > Jacques Nadeau > CTO and Co-Founder, Dremio > > On Mon, Jul 27, 2015 at 12:47 PM, Abdel Hakim Deneche < > adeneche@maprtech.com > > wrote: > > > @Jacques, my understanding is that chunks are not owned by specific a > > thread but they are part of a specific memory arena which is in turn only > > accessed by specific threads. Do you want me to find which threads are > > associated with the same arena where we have hanging chunks ? > > > > > > On Mon, Jul 27, 2015 at 11:04 AM, Jacques Nadeau > > wrote: > > > > > It sounds like your statement is that we're cacheing too many unused > > > chunks. Hanifi and I previously discussed implementing a separate > > flushing > > > mechanism to release unallocated chunks that are hanging around. The > > main > > > question is, why are so many chunks hanging around and what threads are > > > they associated with. A Jmap dump and analysis should allow you to do > > > determine which thread owns the excess chunks. My guess would be the > RPC > > > pool since those are long lasting (as opposed to the WorkManager pool, > > > which is contracting). > > > > > > -- > > > Jacques Nadeau > > > CTO and Co-Founder, Dremio > > > > > > On Mon, Jul 27, 2015 at 9:53 AM, Abdel Hakim Deneche < > > > adeneche@maprtech.com> > > > wrote: > > > > > > > When running a set of, mostly window function, queries concurrently > on > > a > > > > single drillbit with a 8GB max direct memory. We are seeing a > > continuous > > > > increase of direct memory allocation. > > > > > > > > We repeat the following steps multiple times: > > > > - we launch in "iteration" of tests that will run all queries in a > > random > > > > order, 10 queries at a time > > > > - after the iteration finishes, we wait for a couple of minute to > give > > > > Drill time to release the memory being held by the finishing > fragments > > > > > > > > Using Drill's memory logger ("drill.allocator") we were able to get > > > > snapshots of how memory was internally used by Netty, we only focused > > on > > > > the number of allocated chunks, if we take this number and multiply > it > > by > > > > 16MB (netty's chunk size) we get approximately the same value > reported > > by > > > > Drill's direct memory allocation. > > > > Here is a graph that shows the evolution of the number of allocated > > > chunks > > > > on a 500 iterations run (I'm working on improving the plots) : > > > > > > > > http://bit.ly/1JL6Kp3 > > > > > > > > In this specific case, after the first iteration Drill was allocating > > > ~2GB > > > > of direct memory, this number kept rising after each iteration to > ~6GB. > > > We > > > > suspect this caused one of our previous runs to crash the JVM. > > > > > > > > If we only focus on the log lines between iterations (when Drill's > > memory > > > > usage is below 10MB) then all allocated chunks are at most 2% usage. > At > > > > some point we end up with 288 nearly empty chunks, yet the next > > iteration > > > > will cause more chunks to be allocated!!! > > > > > > > > is this expected ? > > > > > > > > PS: I am running more tests and will update this thread with more > > > > informations. > > > > > > > > -- > > > > > > > > Abdelhakim Deneche > > > > > > > > Software Engineer > > > > > > > > > > > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > > > < > > > > > > > > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > > > > > > > > > > > > > > > -- > > > > Abdelhakim Deneche > > > > Software Engineer > > > > > > > > > > Now Available - Free Hadoop On-Demand Training > > < > > > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > > > > > -- Abdelhakim Deneche Software Engineer Now Available - Free Hadoop On-Demand Training --001a114d80faa00831051c383eae--