Return-Path: X-Original-To: apmail-spark-user-archive@minotaur.apache.org Delivered-To: apmail-spark-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 93CD21851D for ; Thu, 1 Oct 2015 06:21:47 +0000 (UTC) Received: (qmail 79561 invoked by uid 500); 1 Oct 2015 06:21:44 -0000 Delivered-To: apmail-spark-user-archive@spark.apache.org Received: (qmail 79464 invoked by uid 500); 1 Oct 2015 06:21:44 -0000 Mailing-List: contact user-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@spark.apache.org Received: (qmail 79447 invoked by uid 99); 1 Oct 2015 06:21:44 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 01 Oct 2015 06:21:44 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 95A1D1A353D for ; Thu, 1 Oct 2015 06:21:43 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id RRB0TyA-AVlU for ; Thu, 1 Oct 2015 06:21:28 +0000 (UTC) Received: from mail-vk0-f49.google.com (mail-vk0-f49.google.com [209.85.213.49]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 576AF42F38 for ; Thu, 1 Oct 2015 06:21:28 +0000 (UTC) Received: by vkfp126 with SMTP id p126so33673881vkf.3 for ; Wed, 30 Sep 2015 23:21:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=z1OyfT8rc5xT6QvT6x7qbKICvTwZn1vyspWDnja/QbY=; b=WeLb2bhc6e4j9/9/HH1LDqaOEb216kEsuOI8Yi6FFI/uvmY50DPvjorLO2r0X4XfOs GcDPslbEDPpKZtWmbylCOXMjqTx40fDAxYQkbSLY+6QT7rjTHpsz2frUpFmN2J6foHtK pn4bq3Fwrv5Z22Bv9WaVI/0EpEkjcVNfUHbwmsmdudmA3gyKVZymNfPB3Vy/EIs1iUB2 A8CjIIEsZ0soE/DNcTZtD+/NUzzFdMm76p36oTbr9KnxCjB+CJlO50hy3ZQJH9MMr2kW v50sNK1iTkEmqj8O5GPMW9TY7RGkY+ZtFd5lXOZwK0cKM+9BngWpEj4jLPlYAhb3bmZG awBw== MIME-Version: 1.0 X-Received: by 10.31.2.205 with SMTP id 196mr5722114vkc.123.1443680487894; Wed, 30 Sep 2015 23:21:27 -0700 (PDT) Received: by 10.31.56.207 with HTTP; Wed, 30 Sep 2015 23:21:27 -0700 (PDT) In-Reply-To: References: Date: Thu, 1 Oct 2015 14:21:27 +0800 Message-ID: Subject: Re: What is the best way to submit multiple tasks? From: Shixiong Zhu To: Saif.A.Ellafi@wellsfargo.com Cc: "user@spark.apache.org" Content-Type: multipart/alternative; boundary=001a11c01300a679240521050e57 --001a11c01300a679240521050e57 Content-Type: text/plain; charset=UTF-8 Right, you can use SparkContext and SQLContext in multiple threads. They are thread safe. Best Regards, Shixiong Zhu 2015-10-01 4:57 GMT+08:00 : > Hi all, > > I have a process where I do some calculations on each one of the columns > of a dataframe. > Intrinsecally, I run across each column with a for loop. On the other > hand, each process itself is non-entirely-distributable. > > To speed up the process, I would like to submit a spark program for each > column, any suggestions? I was thinking on primitive threads sharing a > spark context. > > Thank you, > Saif > > --001a11c01300a679240521050e57 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Right, you can use SparkContext and SQLContext in multiple= threads. They are thread safe.

Best Regards,

Shixiong Zhu

2015-10-01 4:57 GMT+08:00 <Saif= .A.Ellafi@wellsfargo.com>:
Hi all,
=C2=A0
I have a process where I do some calculations on each one of the colum= ns of a dataframe.
Intrinsecally, I run across each column with a for loop. On the other = hand, each process itself is non-entirely-distributable.
=C2=A0
To speed up the process, I would like to submit a spark program for ea= ch column, any suggestions? I was thinking on primitive threads sharing a s= park context.
=C2=A0
Thank you,
Saif
=C2=A0

--001a11c01300a679240521050e57--