Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id D7E40200B9B for ; Wed, 12 Oct 2016 09:24:44 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id D6742160AD4; Wed, 12 Oct 2016 07:24:44 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 04681160AD3 for ; Wed, 12 Oct 2016 09:24:43 +0200 (CEST) Received: (qmail 89146 invoked by uid 500); 12 Oct 2016 07:24:42 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 89136 invoked by uid 99); 12 Oct 2016 07:24:41 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 12 Oct 2016 07:24:41 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 850FE1A00A2 for ; Wed, 12 Oct 2016 07:24:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.88 X-Spam-Level: *** X-Spam-Status: No, score=3.88 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, KAM_BADIPHTTP=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, WEIRD_PORT=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id bDit6yiVFBwP for ; Wed, 12 Oct 2016 07:24:40 +0000 (UTC) Received: from mail-ua0-f179.google.com (mail-ua0-f179.google.com [209.85.217.179]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 43E885F1EE for ; Wed, 12 Oct 2016 07:24:39 +0000 (UTC) Received: by mail-ua0-f179.google.com with SMTP id j12so2639416uab.3 for ; Wed, 12 Oct 2016 00:24:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:from:date:message-id:subject:to; bh=D6/zGesxpYCfpcHPKLycmEDTmvMdoyKiI807hulMJQg=; b=C3ykJwrZFGI90ugoGvXj6NC+lB5INV4Jy8vVW615x+bPtOBZNcL7KevmlhRGyp3RB1 IjjTN6Z+27+GWVCWTVRhWBH/EreMOXxDnoRpHGLnaLcXOp127BRg48zRzS/7AZ4G54+1 KAQyWnHWQpCkj1B7Jah/pEnhW6dTGBkO+exY3OkpzhTej4an10eHuqUj8JurT0nwjl5L tkRxJ9O+X7qvS3hUBhho1YUqQCZOOxOMrnNJNOwFbKP5N7PBLv1ZnTvJQa+ecbc4EJ01 ip6eYCSluwTe82ORj7AosWtnt/OeJ9ptsD2hWHZKdx/0yzNNFVrHL01BLtY5FgXbTAtv VDrQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=D6/zGesxpYCfpcHPKLycmEDTmvMdoyKiI807hulMJQg=; b=DVF4X4+L7BKVxRVqhgkY20XLw+h8uEat/TtYXlhVX9mn6D8vFER0No8qQBolqdHEuY V3ZgNTVhauvnh2ANftxtlCP+8EG9R60GGnsE/iVg5lCnRlXkGH69bqlP0BK6pr97Jvyx pSjIromxlGNWJjezlb+o9bkQzVUGtpWYU20GT03ZOOfdWQNOu4LzFaa1HTzykqfn+YnA aVrHJlaBZaarkfyo6nhB1Z0EY75YMmTWO/36phHIdHCcb/i8PEpuUc/iJ87O9ExU7asJ Supb2j/u5kfl/rR6krVjd04PSuPRcqqzBfZSbSJyMSEFZCXGfP14HbB+wuXcwmulsqBb 0xMg== X-Gm-Message-State: AA6/9RlWB6LkV6VOl2WuFS6VyMcYaMELY6XfFUW1F9Mv6igpcObi+CYpGG0lOd7Py/0oOQpYLth2SUbcjKTzoA== X-Received: by 10.176.80.197 with SMTP id d5mr521256uaa.128.1476257068761; Wed, 12 Oct 2016 00:24:28 -0700 (PDT) MIME-Version: 1.0 Received: by 10.31.196.197 with HTTP; Wed, 12 Oct 2016 00:24:28 -0700 (PDT) From: =?UTF-8?B?6ZmI5ZOy?= Date: Wed, 12 Oct 2016 15:24:28 +0800 Message-ID: Subject: Spark ML OOM problem To: spark users mail list Content-Type: multipart/alternative; boundary=94eb2c1911ec2e4afb053ea5e29f archived-at: Wed, 12 Oct 2016 07:24:45 -0000 --94eb2c1911ec2e4afb053ea5e29f Content-Type: text/plain; charset=UTF-8 Hi I'm using spark ml to train RandomForest Model . There is about over 200, 000 lines in the training data file and about 100 features. I'm running spark in local mode and with JAVA_OPTS like: -Xms1024m -Xmx10296m -XX:+PrintGCDetails -XX:+PrintGCTimeStamps, but OOM error keep coming out, I tried with spark configuration change to avoid this but failed. My spark conf: *spark.memory.fraction:0.85* *spark.executor.instances: 16* *spark.executor.heartbeatInterval:120 * *spark.driver.maxResultSize:0* spark.ui.retainedJobs=0 ... // I put all the similar conf to 0 , spark.ui.retainedStages ... Here is some GC log: 4876.028: [Full GC [PSYoungGen: 439296K->417358K(878592K)] [ParOldGen: 9139497K->9139487K(9225216K)] 9578793K->9556845K(10103808K) [PSPermGen: 81436K->81436K(81920K)], 2.0203540 secs] [Times: user=49.85 sys=0.12, real=2.02 secs] 4878.100: [Full GC [PSYoungGen: 417930K->187983K(878592K)] [ParOldGen: 9139487K->9166111K(9225216K)] 9557418K->9354094K(10103808K) [PSPermGen: 81436K->81436K(81920K)], 4.2368530 secs] [Times: user=53.77 sys=0.09, real=4.23 secs] 4882.414: [Full GC [PSYoungGen: 428018K->202158K(878592K)] [ParOldGen: 9211167K->9196569K(9225216K)] 9639185K->9398727K(10103808K) [PSPermGen: 81436K->81436K(81920K)], 4.4419950 secs] [Times: user=54.75 sys=0.08, real=4.44 secs] 4886.886: [Full GC [PSYoungGen: 425657K->397128K(878592K)] [ParOldGen: 9196569K->9196568K(9225216K)] 9622227K->9593697K(10103808K) [PSPermGen: 81436K->81436K(81920K)], 2.3522140 secs] [Times: user=51.41 sys=0.09, real=2.35 secs] 4889.239: [Full GC [PSYoungGen: 397128K->397128K(878592K)] [ParOldGen: 9196568K->9196443K(9225216K)] 9593697K->9593572K(10103808K) [PSPermGen: 81436K->81289K(81408K)], 30.5637160 secs] [Times: user=767.29 sys=2.98, real=30.57 secs] the Full GC failed to collect enough memory , so OOM I have two questions : 1. why spark log always show free memory like: 4834409 [dispatcher-event-loop-5] INFO org.apache.spark.storage.BlockManagerInfo - Removed broadcast_266_piece0 on 172.17.1.235:9948 in memory (size: 642.5 KB, free: 7.1 GB) Is this Wrong ? 2. How to avoid OOM here ? do I have to increase -Xmx to large value ? How does spark use these memory , what's in those memory ? anyone can guide to some docs ? Thanks Patrick --94eb2c1911ec2e4afb053ea5e29f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi=C2=A0
=C2=A0 =C2=A0 I'm using spark ml to train= RandomForest Model . There is about over 200, 000 = lines in the training data file =C2=A0and about 100 features. I'm running spark in local mode and with JAVA_= OPTS like:=C2=A0-Xms1024m -Xm= x10296m =C2=A0-XX:+PrintGCDetails -XX:+PrintGCTimeStamps, but OOM er= ror keep coming out, I tried with spark configuration change to avoid this = but failed.
My spark conf:
s= park.memory.fraction:0.85
= spark.executor.instances: 16
spark.executor.heartbeatInterval:120 =C2=A0=C2=A0
spark.driver.maxResultSize:0=
spark.ui.retainedJobs=3D0 ... =C2=A0// I put all the similar= conf to 0 ,=C2=A0spark.ui.retainedStages ...=C2=A0

Here is some GC log:
4876.028: [Full GC [PSYoungGen: 4= 39296K->417358K(878592K)] [ParOldGen: 9139497K->9139487K(9225216K)] 9= 578793K->9556845K(10103808K) [PSPermGen: 81436K->81436K(81920K)], 2.0= 203540 secs] [Times: user=3D49.85 sys=3D0.12, real=3D2.02 secs]=C2=A0
=
4878.100: [Full GC [PSYoungGen: 417930K->187983K(878592K)] [ParOldG= en: 9139487K->9166111K(9225216K)] 9557418K->9354094K(10103808K) [PSPe= rmGen: 81436K->81436K(81920K)], 4.2368530 secs] [Times: user=3D53.77 sys= =3D0.09, real=3D4.23 secs]=C2=A0
4882.414: [Full GC [PSYoungGen: = 428018K->202158K(878592K)] [ParOldGen: 9211167K->9196569K(9225216K)] = 9639185K->9398727K(10103808K) [PSPermGen: 81436K->81436K(81920K)], 4.= 4419950 secs] [Times: user=3D54.75 sys=3D0.08, real=3D4.44 secs]=C2=A0
4886.886: [Full GC [PSYoungGen: 425657K->397128K(878592K)] [ParOld= Gen: 9196569K->9196568K(9225216K)] 9622227K->9593697K(10103808K) [PSP= ermGen: 81436K->81436K(81920K)], 2.3522140 secs] [Times: user=3D51.41 sy= s=3D0.09, real=3D2.35 secs]=C2=A0
4889.239: [Full GC [PSYoungGen:= 397128K->397128K(878592K)] [ParOldGen: 9196568K= ->9196443K(9225216K)] 9593697K->9593572K(10103808K) [PSPermGen= : 81436K->81289K(81408K)], 30.5637160 secs] [Times: user=3D767.29 sys=3D= 2.98, real=3D30.57 secs]=C2=A0

the Full GC f= ailed to collect enough memory , so OOM=C2=A0

I ha= ve two questions :=C2=A0
1. why spark log always show free memory= like:=C2=A0
4834409 [dispatcher-event-loop-5] INFO org.apac= he.spark.storage.BlockManagerInfo - Removed broadcast_266_piece0 on 172.17.1.235:9948 in memory (size: 642.5 = KB, free: 7.1 GB)
Is this Wrong ? =C2=A0

2. How to avoid= OOM here ? do I have to increase -Xmx to large value ? How does spark use = these memory , what's in those memory ? anyone can guide to some docs ?= =C2=A0


Thanks=C2=A0

<= /div>
Patrick
--94eb2c1911ec2e4afb053ea5e29f--