Mailing-List: contact user-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: 
 <FF25CAA07602924FA1D3D78C549C3A0D732F7D@MSGEXOXM1115.ent.wfb.bank.corp>
References: 
 <FF25CAA07602924FA1D3D78C549C3A0D732F7D@MSGEXOXM1115.ent.wfb.bank.corp>
Date: Thu, 1 Oct 2015 14:21:27 +0800
Message-ID: 
 <CAPn6-YQY9dYnC4vdJy6kgi2RbaDBvj+HhgC-FSXrXgVmrrUy=Q@mail.gmail.com>
Subject: Re: What is the best way to submit multiple tasks?
From: Shixiong Zhu <zsxwing@gmail.com>
To: Saif.A.Ellafi@wellsfargo.com
Cc: "user@spark.apache.org" <user@spark.apache.org>
Content-Type: multipart/alternative; boundary=001a11c01300a679240521050e57

--001a11c01300a679240521050e57
Content-Type: text/plain; charset=UTF-8

Right, you can use SparkContext and SQLContext in multiple threads. They
are thread safe.

Best Regards,
Shixiong Zhu

2015-10-01 4:57 GMT+08:00 <Saif.A.Ellafi@wellsfargo.com>:

> Hi all,
>
> I have a process where I do some calculations on each one of the columns
> of a dataframe.
> Intrinsecally, I run across each column with a for loop. On the other
> hand, each process itself is non-entirely-distributable.
>
> To speed up the process, I would like to submit a spark program for each
> column, any suggestions? I was thinking on primitive threads sharing a
> spark context.
>
> Thank you,
> Saif
>
>

--001a11c01300a679240521050e57
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Right, you can use SparkContext and SQLContext in multiple=
 threads. They are thread safe.</div><div class=3D"gmail_extra"><br clear=
=3D"all"><div><div class=3D"gmail_signature"><div dir=3D"ltr"><div><div dir=
=3D"ltr"><div><div dir=3D"ltr"><div><div dir=3D"ltr"><p>Best Regards,</p><d=
iv>Shixiong Zhu</div></div></div></div></div></div></div></div></div></div>
<br><div class=3D"gmail_quote">2015-10-01 4:57 GMT+08:00  <span dir=3D"ltr"=
>&lt;<a href=3D"mailto:Saif.A.Ellafi@wellsfargo.com" target=3D"_blank">Saif=
.A.Ellafi@wellsfargo.com</a>&gt;</span>:<br><blockquote class=3D"gmail_quot=
e" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">


<div>
<font face=3D"Calibri" size=3D"2"><span style=3D"font-size:11pt">
<div>Hi all,</div>
<div>=C2=A0</div>
<div>I have a process where I do some calculations on each one of the colum=
ns of a dataframe.</div>
<div>Intrinsecally, I run across each column with a for loop. On the other =
hand, each process itself is non-entirely-distributable.</div>
<div>=C2=A0</div>
<div>To speed up the process, I would like to submit a spark program for ea=
ch column, any suggestions? I was thinking on primitive threads sharing a s=
park context.</div>
<div>=C2=A0</div>
<div>Thank you,</div>
<div>Saif</div>
<div>=C2=A0</div>
</span></font>
</div>

</blockquote></div><br></div>

--001a11c01300a679240521050e57--