From user-return-506-archive-asf-public=cust-asf.ponee.io@arrow.apache.org Fri Jun 12 11:19:51 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id D6D9C1804BB for ; Fri, 12 Jun 2020 13:19:50 +0200 (CEST) Received: (qmail 5574 invoked by uid 500); 12 Jun 2020 11:19:50 -0000 Mailing-List: contact user-help@arrow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@arrow.apache.org Delivered-To: mailing list user@arrow.apache.org Received: (qmail 5558 invoked by uid 99); 12 Jun 2020 11:19:50 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 12 Jun 2020 11:19:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 65026181491 for ; Fri, 12 Jun 2020 11:19:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.986 X-Spam-Level: X-Spam-Status: No, score=-1.986 tagged_above=-999 required=6.31 tests=[HTML_MESSAGE=0.2, HTTPS_HTTP_MISMATCH=0.1, KAM_DMARC_STATUS=0.01, KAM_SHORT=0.001, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Received: from mx1-he-de.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id DkDhbyTMOpqX for ; Fri, 12 Jun 2020 11:19:45 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=130.161.131.5; helo=mailservice.tudelft.nl; envelope-from=t.ahmad@tudelft.nl; receiver= Received: from mailservice.tudelft.nl (mailservice.tudelft.nl [130.161.131.5]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id CBD777DE00 for ; Fri, 12 Jun 2020 11:19:44 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by amavis (Postfix) with ESMTP id 29FBB2400A9; Fri, 12 Jun 2020 13:19:38 +0200 (CEST) X-Virus-Scanned: amavisd-new at tudelft.nl Received: from mailservice.tudelft.nl ([130.161.131.71]) by localhost (tudelft.nl [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id aVdDpTFFv6N4; Fri, 12 Jun 2020 13:19:36 +0200 (CEST) Received: from SRV218.tudelft.net (mailboxcluster.tudelft.net [131.180.6.18]) (using TLSv1.2 with cipher AES256-SHA256 (256/256 bits)) (No client certificate requested) by mx4.tudelft.nl (Postfix) with ESMTPS id EF873240087; Fri, 12 Jun 2020 13:19:36 +0200 (CEST) Received: from SRV216.tudelft.net (131.180.6.16) by SRV218.tudelft.net (131.180.6.18) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P521) id 15.1.1913.5; Fri, 12 Jun 2020 13:19:31 +0200 Received: from SRV216.tudelft.net ([fe80::6c3f:da5b:7e40:4649]) by SRV216.tudelft.net ([fe80::6c3f:da5b:7e40:4649%13]) with mapi id 15.01.1913.007; Fri, 12 Jun 2020 13:19:31 +0200 From: Tanveer Ahmad - EWI To: "user@arrow.apache.org" , "emkornfield@gmail.com" Subject: Re: Running plasma_store_server (in background) on each Spark worker node Thread-Topic: Running plasma_store_server (in background) on each Spark worker node Thread-Index: AQHWP3g/edoZJ/gybk6Bw6c6GFzx9qjScoeygAHZq4CAAIrn3g== Date: Fri, 12 Jun 2020 11:19:31 +0000 Message-ID: <5a0457945f0d43de87eeb221cc005cb2@tudelft.nl> References: <7af09b58d9904e6a8d610c29bcd5d54c@tudelft.nl> , In-Reply-To: Accept-Language: en-US, nl-NL Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: Content-Type: multipart/alternative; boundary="_000_5a0457945f0d43de87eeb221cc005cb2tudelftnl_" MIME-Version: 1.0 --_000_5a0457945f0d43de87eeb221cc005cb2tudelftnl_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable Hi Micah, Thank you so much. I am able to run Plasma in Spark cluster through map() methods on each work= er nodes. Regards, Tanveer Ahmad ________________________________ From: Micah Kornfield Sent: Friday, June 12, 2020 7:00:03 AM To: user@arrow.apache.org Subject: Re: Running plasma_store_server (in background) on each Spark work= er node Hi Tanveer, How to ensure the server is running probably depends on your cluster manage= ment system (I'm not familiar with Slurm). But if you only have 6 machines= , you could probably SSH into each of them and start the process by hand. Ray's cluster management [1] might be another place to look for examples (I= believe Ray spawns a plasma server on each cluster node). Generally, the scope of Arrow doesn't include cluster management, so there = might not be too much in the way of responses on this list. Hope this helps. Micah [1] https://docs.ray.io/en/master/autoscaling.html On Wed, Jun 10, 2020 at 3:48 PM Tanveer Ahmad - EWI > wrote: Hi Neal, Yes, my question is: How can I run Plasma Store in each worker node on Spar= k cluster. Suppose my cluster consist of 6 nodes (1 master plus 5 workers), I want to = run Plasma Store on all 5 worker nodes. Thanks. Regards, Tanveer Ahmad ________________________________ From: Neal Richardson > Sent: Thursday, June 11, 2020 12:40:47 AM To: user@arrow.apache.org Subject: Re: Running plasma_store_server (in background) on each Spark work= er node Hi Tanveer, Do you have any specific questions, or have you encountered trouble with yo= ur setup? Neal On Wed, Jun 10, 2020 at 2:23 PM Tanveer Ahmad - EWI > wrote: Hi all, I want to run an external command (plasma_store_server -m 3000000000 -s /tm= p/store0 &) in the background on each worker node of my Spark cluster. So that that external process should be running during the wh= ole Spark job. The plasma_store_server process is used for storing and retrieving Apache A= rrow data in Apache Spark. I am using PySpark for Spark programming and SLURM for Spark cluster creation. Any help will be highly appreciated! Regards, Tanveer Ahmad --_000_5a0457945f0d43de87eeb221cc005cb2tudelftnl_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable

Hi Micah,

Thank you so much.

I am able to run Plasma in Spark cluster= through map() methods on each worker nodes.

Regards,
Tanvee= r Ahmad

From: Micah Kornfield <e= mkornfield@gmail.com>
Sent: Friday, June 12, 2020 7:00:03 AM
To: user@arrow.apache.org
Subject: Re: Running plasma_store_server (in background) on each Spa= rk worker node

Hi Tanveer,

How to ensure the server is running probably depends on your cluster m= anagement system (I'm not familiar with Slurm). But if you only have = 6 machines, you could probably SSH into each of them and start the process = by hand.

Ray's cluster management [1] might be another place to look for exampl= es (I believe Ray spawns a plasma server on each cluster node).

Generally, the scope of Arrow doesn't include cluster management, so t= here might not be too much in the way of responses on this list.

Hope this helps.

Micah

[1] https://docs.ray.io/en/= master/autoscaling.html

On Wed, Jun 10, 2020 at 3:48 PM Tanve= er Ahmad - EWI <T.Ahmad@tudelft.nl= > wrote:

Hi Neal,

Yes, my question is: How can I run Plasma = Store in each worker node on Spark cluster.

Suppose my cluster consist of 6 nodes (1 m= aster plus 5 workers), I want to run P= lasma Store on all 5 worker nodes. Thanks.

Regards,
Tanveer = Ahmad

Fro= m: Neal Richardson <neal.p.richardson@gmail.com>
Sent: Thursday, June 11, 2020 12:40:47 AM
To: user@= arrow.apache.org
Subject: Re: Running plasma_store_server (in background) on each Spa= rk worker node

Hi Tanveer,

Do you have any specific questions, or have you encountered trouble wi= th your setup?

Neal

On Wed, Jun 10, 2020 at 2:23 PM Tanve= er Ahmad - EWI <= T.Ahmad@tudelft.nl> wrote:

Hi all,

I want to run an ex= ternal command (plasma_store_server -m 3000000000 -s /tmp/store0 &) in = the background on each worker node of my Spark cluster= . So that that external process should be running during the whole Spark jo= b.

The plasma_store_se= rver process is used for storing and retrieving Apache Arrow data in Apache= Spark.

I am using PySpark = for Spark programming and SLURM for Spark cluster= creati= on.

Any help will be hi= ghly appreciated!
Regards,

Tanveer Ahmad

--_000_5a0457945f0d43de87eeb221cc005cb2tudelftnl_--