Mailing-List: contact user-help@spark.apache.org; run by ezmlm
Precedence: bulk
MIME-Version: 1.0
In-Reply-To: 
 <CA+XEjLrSAaoYnpf3dQ35iiKOzwJ+FPYL79g17w97UGbD=gOtvw@mail.gmail.com>
References: 
 <CABE6Qg5+GoSvCT4WtiztJmK7mKQM2FpkXtfzBaHH8r5y-uMukg@mail.gmail.com>
 <CA+XEjLrSAaoYnpf3dQ35iiKOzwJ+FPYL79g17w97UGbD=gOtvw@mail.gmail.com>
From: Tathagata Das <tdas@databricks.com>
Date: Tue, 27 Oct 2015 23:17:08 -0700
Message-ID: 
 <CA+AHuKn_qs+SA3bRPgjPG9OhiNR9aaFjHPR7uFtLDhVG47zQ1A@mail.gmail.com>
Subject: Re: [Spark Streaming] Connect to Database only once at the start of
 Streaming job
To: diplomatic Guru <diplomaticguru@gmail.com>
Cc: user <user@spark.apache.org>
Content-Type: multipart/alternative; boundary=001a114faaa4b0e66c05232426c1

--001a114faaa4b0e66c05232426c1
Content-Type: text/plain; charset=UTF-8

Yeah, of course. Just create an RDD from jdbc, call cache()/persist(), then
force it to be evaluated using something like count(). Once it is cached,
you can use it in a StreamingContext. Because of the cache it should not
access JDBC any more.

On Tue, Oct 27, 2015 at 12:04 PM, diplomatic Guru <diplomaticguru@gmail.com>
wrote:

> I know it uses lazy model, which is why I was wondering.
>
> On 27 October 2015 at 19:02, Uthayan Suthakar <uthayan.suthakar@gmail.com>
> wrote:
>
>> Hello all,
>>
>> What I wanted to do is configure the spark streaming job to read the
>> database using JdbcRDD and cache the results. This should occur only once
>> at the start of the job. It should not make any further connection to DB
>>  afterwards. Is it possible to do that?
>>
>
>

--001a114faaa4b0e66c05232426c1
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Yeah, of course. Just create an RDD from jdbc, call cache(=
)/persist(), then force it to be evaluated using something like count(). On=
ce it is cached, you can use it in a StreamingContext. Because of the cache=
 it should not access JDBC any more.</div><div class=3D"gmail_extra"><br><d=
iv class=3D"gmail_quote">On Tue, Oct 27, 2015 at 12:04 PM, diplomatic Guru =
<span dir=3D"ltr">&lt;<a href=3D"mailto:diplomaticguru@gmail.com" target=3D=
"_blank">diplomaticguru@gmail.com</a>&gt;</span> wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;pad=
ding-left:1ex"><div dir=3D"ltr">I know it uses lazy model, which is why I w=
as wondering.=C2=A0</div><div class=3D"HOEnZb"><div class=3D"h5"><div class=
=3D"gmail_extra"><br><div class=3D"gmail_quote">On 27 October 2015 at 19:02=
, Uthayan Suthakar <span dir=3D"ltr">&lt;<a href=3D"mailto:uthayan.suthakar=
@gmail.com" target=3D"_blank">uthayan.suthakar@gmail.com</a>&gt;</span> wro=
te:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-=
left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hello all,<div><br><=
/div><div>What I wanted to do is configure the spark streaming job to read =
the database using JdbcRDD and cache the results. This should occur only on=
ce at the start of the job. It should not make any further connection to DB=
 =C2=A0afterwards. Is it possible to do that?=C2=A0</div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div>

--001a114faaa4b0e66c05232426c1--