Mailing-List: contact user-help@flink.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@flink.apache.org
MIME-Version: 1.0
In-Reply-To: 
 <CAN0XJzOAr3Wwe6oeE1FMwPWeRXnvTGjgJO3+fi5sm0j5___vzw@mail.gmail.com>
References: 
 <CAN0XJzNY-fQ4WmMYvBr1x6t69QVYP4oMfPtqvSVNfjZn+gemSQ@mail.gmail.com>
 <CAELUF_AN_vW4OA08aYDnxEfdzMzrv=VxwaoCDj4M5ZygoTH1Fw@mail.gmail.com>
 <CAKiyyaHDm5S-maWfbaBYyczkepWktQO24_XjhE293maGP+0oBQ@mail.gmail.com>
 <CAN0XJzOS=7tFD+0qWPiGMQjoPVdZfyniTq41um6xWLPou=jL6Q@mail.gmail.com>
 <CAC27z=ONvUQZXyzWjXrgHgPq+1RPaMHiDAMXzroRnNmYUWfwaA@mail.gmail.com>
 <CAN0XJzOAx7npOv48ZcYRuSGnhY_oJY2-Cem16+7A6a9V_8MrDg@mail.gmail.com>
 <CAC27z=PRCAWBranSAxfhi=dQh+zXzuG-ZFgEeCdFyHauvMrv6g@mail.gmail.com>
 <CAN0XJzM5Z_0NBbSPuoEQZczhhsGS2caACVU_RsKhDCF1KwxbXg@mail.gmail.com>
 <CAN0XJzMz98b7J5yu=h=GCMZw9u0u2GGFUWTmddbaPsAEPXc6=A@mail.gmail.com>
 <CAKiyyaEt+4BY1fHpjNNXzd6GqreG9PJc7CkDVxQbwW3AXyBExQ@mail.gmail.com>
 <CAN0XJzOAr3Wwe6oeE1FMwPWeRXnvTGjgJO3+fi5sm0j5___vzw@mail.gmail.com>
From: Flavio Pompermaier <pompermaier@okkam.it>
Date: Thu, 7 Apr 2016 12:37:50 +0200
Message-ID: 
 <CAELUF_CnOp5ztUwHdiR0wiTae-pUyXEbH+8EArVHCC914QHm2g@mail.gmail.com>
Subject: Re: threads, parallelism and task managers
To: user <user@flink.apache.org>
Content-Type: multipart/alternative; boundary=047d7b5d295cb70911052fe2ac05

--047d7b5d295cb70911052fe2ac05
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

We've finally created a running example (For Flink 0.10.2) of our improved
JDBC imputformat that you can run from an IDE (it creates an in-memory
derby database with 1000 rows and batch of 10) at
https://gist.github.com/fpompermaier/bcd704abc93b25b6744ac76ac17ed351.
The first time you run the program you have to comment the following line:

        stmt.executeUpdate("Drop Table users ");

In your pom declare the following dependencies:

<dependency>
<groupId>org.apache.derby</groupId>
<artifactId>derby</artifactId>
<version>10.10.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-pool2</artifactId>
<version>2.4.2</version>
</dependency>

In my laptop I have 8 cores and if I put parallelism to 16 I expect to see
16 calls to the connection pool (i.e. '=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D CREATING NEW
CONNECTION!') while I see only 8 (up to my maximum number of cores).
The number of created task instead is correct (16).

I hope this could help in understanding where the problem is!

Best and thank in advance,
Flavio

On Wed, Mar 30, 2016 at 11:01 AM, Stefano Bortoli <s.bortoli@gmail.com>
wrote:

> Hi Ufuk,
>
> here is our preliminary input formar implementation:
> https://gist.github.com/anonymous/dbf05cad2a6cc07b8aa88e74a2c23119
>
> if you need a running project, I will have to create a test one cause I
> cannot share the current configuration.
>
> thanks a lot in advance!
>
>
>
> 2016-03-30 10:13 GMT+02:00 Ufuk Celebi <uce@apache.org>:
>
>> Do you have the code somewhere online? Maybe someone can have a quick
>> look over it later. I'm pretty sure that is indeed a problem with the
>> custom input format.
>>
>> =E2=80=93 Ufuk
>>
>> On Tue, Mar 29, 2016 at 3:50 PM, Stefano Bortoli <s.bortoli@gmail.com>
>> wrote:
>> > Perhaps there is a misunderstanding on my side over the parallelism an=
d
>> > split management given a data source.
>> >
>> > We started from the current JDBCInputFormat to make it multi-thread.
>> Then,
>> > given a space of keys, we create the splits based on a fetchsize set a=
s
>> a
>> > parameter. In the open, we get a connection from the pool, and execute=
 a
>> > query using the split interval. This sets the 'resultSet', and then th=
e
>> > DatasourceTask iterates between reachedEnd, next and close. On close,
>> the
>> > connection is returned to the pool. We set parallelism to 32, and we
>> would
>> > expect 32 connection opened but the connections opened are just 8.
>> >
>> > We tried to make an example with the textinputformat, but being a
>> > delimitedinpurformat, the open is called sequentially when statistics
>> are
>> > built, and then the processing is executed in parallel just after all
>> the
>> > open are executed. This is not feasible in our case, because there
>> would be
>> > millions of queries before the statistics are collected.
>> >
>> > Perhaps we are doing something wrong, still to figure out what. :-/
>> >
>> > thanks a lot for your help.
>> >
>> > saluti,
>> > Stefano
>> >
>> >
>> > 2016-03-29 13:30 GMT+02:00 Stefano Bortoli <s.bortoli@gmail.com>:
>> >>
>> >> That is exactly my point. I should have 32 threads running, but I hav=
e
>> >> only 8. 32 Task are created, but only only 8 are run concurrently.
>> Flavio
>> >> and I will try to make a simple program to produce the problem. If we
>> solve
>> >> our issues on the way, we'll let you know.
>> >>
>> >> thanks a lot anyway.
>> >>
>> >> saluti,
>> >> Stefano
>> >>
>> >> 2016-03-29 12:44 GMT+02:00 Till Rohrmann <trohrmann@apache.org>:
>> >>>
>> >>> Then it shouldn=E2=80=99t be a problem. The ExeuctionContetxt is use=
d to run
>> >>> futures and their callbacks. But as Ufuk said, each task will spawn
>> it=E2=80=99s own
>> >>> thread and if you set the parallelism to 32 then you should have 32
>> threads
>> >>> running.
>> >>>
>> >>>
>> >>> On Tue, Mar 29, 2016 at 12:29 PM, Stefano Bortoli <
>> s.bortoli@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> In fact, I don't use it. I just had to crawl back the runtime
>> >>>> implementation to get to the point where parallelism was switching
>> from 32
>> >>>> to 8.
>> >>>>
>> >>>> saluti,
>> >>>> Stefano
>> >>>>
>> >>>> 2016-03-29 12:24 GMT+02:00 Till Rohrmann <till.rohrmann@gmail.com>:
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> for what do you use the ExecutionContext? That should actually be
>> >>>>> something which you shouldn=E2=80=99t be concerned with since it i=
s only
>> used
>> >>>>> internally by the runtime.
>> >>>>>
>> >>>>> Cheers,
>> >>>>> Till
>> >>>>>
>> >>>>>
>> >>>>> On Tue, Mar 29, 2016 at 12:09 PM, Stefano Bortoli <
>> s.bortoli@gmail.com>
>> >>>>> wrote:
>> >>>>>>
>> >>>>>> Well, in theory yes. Each task has a thread, but only a number is
>> run
>> >>>>>> in parallel (the job of the scheduler).  Parallelism is set in th=
e
>> >>>>>> environment. However, whereas the parallelism parameter is set an=
d
>> read
>> >>>>>> correctly, when it comes to actual starting of the threads, the
>> number is
>> >>>>>> fix to 8. We run a debugger to get to the point where the thread
>> was
>> >>>>>> started. As Flavio mentioned, the ExecutionContext has the
>> parallelims set
>> >>>>>> to 8. We have a pool of connections to a RDBS and il logs the
>> creation of
>> >>>>>> just 8 connections although parallelism is much higher.
>> >>>>>>
>> >>>>>> My question is whether this is a bug (or a feature) of the
>> >>>>>> LocalMiniCluster. :-) I am not scala expert, but I see some
>> variable
>> >>>>>> assignment in setting up of the MiniCluster, involving parallelis=
m
>> and
>> >>>>>> 'default values'. Default values in terms of parallelism are base=
d
>> on the
>> >>>>>> number of cores.
>> >>>>>>
>> >>>>>> thanks a lot for the support!
>> >>>>>>
>> >>>>>> saluti,
>> >>>>>> Stefano
>> >>>>>>
>> >>>>>> 2016-03-29 11:51 GMT+02:00 Ufuk Celebi <uce@apache.org>:
>> >>>>>>>
>> >>>>>>> Hey Stefano,
>> >>>>>>>
>> >>>>>>> this should work by setting the parallelism on the environment,
>> e.g.
>> >>>>>>>
>> >>>>>>> env.setParallelism(32)
>> >>>>>>>
>> >>>>>>> Is this what you are doing?
>> >>>>>>>
>> >>>>>>> The task threads are not part of a pool, but each submitted task
>> >>>>>>> creates its own Thread.
>> >>>>>>>
>> >>>>>>> =E2=80=93 Ufuk
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> On Fri, Mar 25, 2016 at 9:10 PM, Flavio Pompermaier
>> >>>>>>> <pompermaier@okkam.it> wrote:
>> >>>>>>> > Any help here? I think that the problem is that the JobManager
>> >>>>>>> > creates the
>> >>>>>>> > executionContext of the scheduler with
>> >>>>>>> >
>> >>>>>>> >        val executionContext =3D ExecutionContext.fromExecutor(=
new
>> >>>>>>> > ForkJoinPool())
>> >>>>>>> >
>> >>>>>>> > and thus the number of concurrently running threads is limited
>> to
>> >>>>>>> > the number
>> >>>>>>> > of cores (using the default constructor of the ForkJoinPool).
>> >>>>>>> > What do you think?
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>> > On Wed, Mar 23, 2016 at 6:55 PM, Stefano Bortoli
>> >>>>>>> > <s.bortoli@gmail.com>
>> >>>>>>> > wrote:
>> >>>>>>> >>
>> >>>>>>> >> Hi guys,
>> >>>>>>> >>
>> >>>>>>> >> I am trying to test a job that should run a number of tasks t=
o
>> >>>>>>> >> read from a
>> >>>>>>> >> RDBMS using an improved JDBC connector. The connection and th=
e
>> >>>>>>> >> reading run
>> >>>>>>> >> smoothly, but I cannot seem to be able to move above the limi=
t
>> of
>> >>>>>>> >> 8
>> >>>>>>> >> concurrent threads running. 8 is of course the number of core=
s
>> of
>> >>>>>>> >> my
>> >>>>>>> >> machine.
>> >>>>>>> >>
>> >>>>>>> >> I have tried working around configurations and settings, but
>> the
>> >>>>>>> >> Executor
>> >>>>>>> >> within the ExecutionContext keeps on having a parallelism of =
8.
>> >>>>>>> >> Although, of
>> >>>>>>> >> course, the parallelism of the execution environment is much
>> >>>>>>> >> higher (in fact
>> >>>>>>> >> I have many more tasks to be allocated).
>> >>>>>>> >>
>> >>>>>>> >> I feel it may be an issue of the LocalMiniCluster configurati=
on
>> >>>>>>> >> that may
>> >>>>>>> >> just override/neglect my wish for higher degree of
>> parallelism. Is
>> >>>>>>> >> there a
>> >>>>>>> >> way for me to work around this issue?
>> >>>>>>> >>
>> >>>>>>> >> please let me know. Thanks a lot for you help! :-)
>> >>>>>>> >>
>> >>>>>>> >> saluti,
>> >>>>>>> >> Stefano
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>> >
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>
>

--047d7b5d295cb70911052fe2ac05
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">We&#39;ve finally created a running example (For Flink 0.1=
0.2) of our improved JDBC imputformat that you can run from an IDE (it crea=
tes an in-memory derby database with 1000 rows and batch of 10) at=C2=A0<a =
href=3D"https://gist.github.com/fpompermaier/bcd704abc93b25b6744ac76ac17ed3=
51" target=3D"_blank">https://gist.github.com/fpompermaier/bcd704abc93b25b6=
744ac76ac17ed351</a>.<div>The first time you run the program you have to co=
mment the following line:</div><div><br></div><div>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 stmt.executeUpdate(&quot;Drop Table users &quot;);<br><div><br></div><d=
iv>In your pom declare the following dependencies:</div><div><br></div><div=
><div>&lt;dependency&gt;</div><div><span style=3D"white-space:pre-wrap">	</=
span>&lt;groupId&gt;org.apache.derby&lt;/groupId&gt;</div><div><span style=
=3D"white-space:pre-wrap">	</span>&lt;artifactId&gt;derby&lt;/artifactId&gt=
;</div><div><span style=3D"white-space:pre-wrap">	</span>&lt;version&gt;10.=
10.1.1&lt;/version&gt;</div><div>&lt;/dependency&gt;</div><div>&lt;dependen=
cy&gt;</div><div><span style=3D"white-space:pre-wrap">	</span>&lt;groupId&g=
t;org.apache.commons&lt;/groupId&gt;</div><div><span style=3D"white-space:p=
re-wrap">	</span>&lt;artifactId&gt;commons-pool2&lt;/artifactId&gt;</div><d=
iv><span style=3D"white-space:pre-wrap">	</span>&lt;version&gt;2.4.2&lt;/ve=
rsion&gt;</div><div>&lt;/dependency&gt;</div></div><div><br></div><div>In m=
y laptop I have 8 cores and if I put parallelism to 16 I expect to see 16 c=
alls to the connection pool (i.e. &#39;=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D CREATING NEW CONNECTION!&#39;) while I see only 8 =
(up to my maximum number of cores).</div><div>The number of created task in=
stead is correct (16).</div><div><br></div><div>I hope this could help in u=
nderstanding where the problem is!</div><div><br></div><div>Best and thank =
in advance,</div><div>Flavio</div></div><div class=3D"gmail_extra"><br><div=
 class=3D"gmail_quote">On Wed, Mar 30, 2016 at 11:01 AM, Stefano Bortoli <s=
pan dir=3D"ltr">&lt;<a href=3D"mailto:s.bortoli@gmail.com" target=3D"_blank=
">s.bortoli@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_q=
uote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1e=
x"><div dir=3D"ltr"><div><div><div>Hi Ufuk,<br><br></div>here is our prelim=
inary input formar implementation:<br><a href=3D"https://gist.github.com/an=
onymous/dbf05cad2a6cc07b8aa88e74a2c23119" target=3D"_blank">https://gist.gi=
thub.com/anonymous/dbf05cad2a6cc07b8aa88e74a2c23119</a><br><br></div>if you=
 need a running project, I will have to create a test one cause I cannot sh=
are the current configuration.<br><br></div>thanks a lot in advance!<br><di=
v><div><br><br></div></div></div><div><div><div class=3D"gmail_extra"><br><=
div class=3D"gmail_quote">2016-03-30 10:13 GMT+02:00 Ufuk Celebi <span dir=
=3D"ltr">&lt;<a href=3D"mailto:uce@apache.org" target=3D"_blank">uce@apache=
.org</a>&gt;</span>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0=
 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Do you have the code=
 somewhere online? Maybe someone can have a quick<br>
look over it later. I&#39;m pretty sure that is indeed a problem with the<b=
r>
custom input format.<br>
<span><font color=3D"#888888"><br>
=E2=80=93 Ufuk<br>
</font></span><div><div><br>
On Tue, Mar 29, 2016 at 3:50 PM, Stefano Bortoli &lt;<a href=3D"mailto:s.bo=
rtoli@gmail.com" target=3D"_blank">s.bortoli@gmail.com</a>&gt; wrote:<br>
&gt; Perhaps there is a misunderstanding on my side over the parallelism an=
d<br>
&gt; split management given a data source.<br>
&gt;<br>
&gt; We started from the current JDBCInputFormat to make it multi-thread. T=
hen,<br>
&gt; given a space of keys, we create the splits based on a fetchsize set a=
s a<br>
&gt; parameter. In the open, we get a connection from the pool, and execute=
 a<br>
&gt; query using the split interval. This sets the &#39;resultSet&#39;, and=
 then the<br>
&gt; DatasourceTask iterates between reachedEnd, next and close. On close, =
the<br>
&gt; connection is returned to the pool. We set parallelism to 32, and we w=
ould<br>
&gt; expect 32 connection opened but the connections opened are just 8.<br>
&gt;<br>
&gt; We tried to make an example with the textinputformat, but being a<br>
&gt; delimitedinpurformat, the open is called sequentially when statistics =
are<br>
&gt; built, and then the processing is executed in parallel just after all =
the<br>
&gt; open are executed. This is not feasible in our case, because there wou=
ld be<br>
&gt; millions of queries before the statistics are collected.<br>
&gt;<br>
&gt; Perhaps we are doing something wrong, still to figure out what. :-/<br=
>
&gt;<br>
&gt; thanks a lot for your help.<br>
&gt;<br>
&gt; saluti,<br>
&gt; Stefano<br>
&gt;<br>
&gt;<br>
&gt; 2016-03-29 13:30 GMT+02:00 Stefano Bortoli &lt;<a href=3D"mailto:s.bor=
toli@gmail.com" target=3D"_blank">s.bortoli@gmail.com</a>&gt;:<br>
&gt;&gt;<br>
&gt;&gt; That is exactly my point. I should have 32 threads running, but I =
have<br>
&gt;&gt; only 8. 32 Task are created, but only only 8 are run concurrently.=
 Flavio<br>
&gt;&gt; and I will try to make a simple program to produce the problem. If=
 we solve<br>
&gt;&gt; our issues on the way, we&#39;ll let you know.<br>
&gt;&gt;<br>
&gt;&gt; thanks a lot anyway.<br>
&gt;&gt;<br>
&gt;&gt; saluti,<br>
&gt;&gt; Stefano<br>
&gt;&gt;<br>
&gt;&gt; 2016-03-29 12:44 GMT+02:00 Till Rohrmann &lt;<a href=3D"mailto:tro=
hrmann@apache.org" target=3D"_blank">trohrmann@apache.org</a>&gt;:<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; Then it shouldn=E2=80=99t be a problem. The ExeuctionContetxt =
is used to run<br>
&gt;&gt;&gt; futures and their callbacks. But as Ufuk said, each task will =
spawn it=E2=80=99s own<br>
&gt;&gt;&gt; thread and if you set the parallelism to 32 then you should ha=
ve 32 threads<br>
&gt;&gt;&gt; running.<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;&gt; On Tue, Mar 29, 2016 at 12:29 PM, Stefano Bortoli &lt;<a href=
=3D"mailto:s.bortoli@gmail.com" target=3D"_blank">s.bortoli@gmail.com</a>&g=
t;<br>
&gt;&gt;&gt; wrote:<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; In fact, I don&#39;t use it. I just had to crawl back the =
runtime<br>
&gt;&gt;&gt;&gt; implementation to get to the point where parallelism was s=
witching from 32<br>
&gt;&gt;&gt;&gt; to 8.<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; saluti,<br>
&gt;&gt;&gt;&gt; Stefano<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt; 2016-03-29 12:24 GMT+02:00 Till Rohrmann &lt;<a href=3D"ma=
ilto:till.rohrmann@gmail.com" target=3D"_blank">till.rohrmann@gmail.com</a>=
&gt;:<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; Hi,<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; for what do you use the ExecutionContext? That should =
actually be<br>
&gt;&gt;&gt;&gt;&gt; something which you shouldn=E2=80=99t be concerned wit=
h since it is only used<br>
&gt;&gt;&gt;&gt;&gt; internally by the runtime.<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; Cheers,<br>
&gt;&gt;&gt;&gt;&gt; Till<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt; On Tue, Mar 29, 2016 at 12:09 PM, Stefano Bortoli &lt;=
<a href=3D"mailto:s.bortoli@gmail.com" target=3D"_blank">s.bortoli@gmail.co=
m</a>&gt;<br>
&gt;&gt;&gt;&gt;&gt; wrote:<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; Well, in theory yes. Each task has a thread, but o=
nly a number is run<br>
&gt;&gt;&gt;&gt;&gt;&gt; in parallel (the job of the scheduler).=C2=A0 Para=
llelism is set in the<br>
&gt;&gt;&gt;&gt;&gt;&gt; environment. However, whereas the parallelism para=
meter is set and read<br>
&gt;&gt;&gt;&gt;&gt;&gt; correctly, when it comes to actual starting of the=
 threads, the number is<br>
&gt;&gt;&gt;&gt;&gt;&gt; fix to 8. We run a debugger to get to the point wh=
ere the thread was<br>
&gt;&gt;&gt;&gt;&gt;&gt; started. As Flavio mentioned, the ExecutionContext=
 has the parallelims set<br>
&gt;&gt;&gt;&gt;&gt;&gt; to 8. We have a pool of connections to a RDBS and =
il logs the creation of<br>
&gt;&gt;&gt;&gt;&gt;&gt; just 8 connections although parallelism is much hi=
gher.<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; My question is whether this is a bug (or a feature=
) of the<br>
&gt;&gt;&gt;&gt;&gt;&gt; LocalMiniCluster. :-) I am not scala expert, but I=
 see some variable<br>
&gt;&gt;&gt;&gt;&gt;&gt; assignment in setting up of the MiniCluster, invol=
ving parallelism and<br>
&gt;&gt;&gt;&gt;&gt;&gt; &#39;default values&#39;. Default values in terms =
of parallelism are based on the<br>
&gt;&gt;&gt;&gt;&gt;&gt; number of cores.<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; thanks a lot for the support!<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; saluti,<br>
&gt;&gt;&gt;&gt;&gt;&gt; Stefano<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt; 2016-03-29 11:51 GMT+02:00 Ufuk Celebi &lt;<a href=
=3D"mailto:uce@apache.org" target=3D"_blank">uce@apache.org</a>&gt;:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; Hey Stefano,<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; this should work by setting the parallelism on=
 the environment, e.g.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; env.setParallelism(32)<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; Is this what you are doing?<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; The task threads are not part of a pool, but e=
ach submitted task<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; creates its own Thread.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; =E2=80=93 Ufuk<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; On Fri, Mar 25, 2016 at 9:10 PM, Flavio Pomper=
maier<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &lt;<a href=3D"mailto:pompermaier@okkam.it" ta=
rget=3D"_blank">pompermaier@okkam.it</a>&gt; wrote:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; Any help here? I think that the problem i=
s that the JobManager<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; creates the<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; executionContext of the scheduler with<br=
>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 val executionC=
ontext =3D ExecutionContext.fromExecutor(new<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; ForkJoinPool())<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; and thus the number of concurrently runni=
ng threads is limited to<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; the number<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; of cores (using the default constructor o=
f the ForkJoinPool).<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; What do you think?<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; On Wed, Mar 23, 2016 at 6:55 PM, Stefano =
Bortoli<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; &lt;<a href=3D"mailto:s.bortoli@gmail.com=
" target=3D"_blank">s.bortoli@gmail.com</a>&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt; wrote:<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; Hi guys,<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; I am trying to test a job that should=
 run a number of tasks to<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; read from a<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; RDBMS using an improved JDBC connecto=
r. The connection and the<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; reading run<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; smoothly, but I cannot seem to be abl=
e to move above the limit of<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; 8<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; concurrent threads running. 8 is of c=
ourse the number of cores of<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; my<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; machine.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; I have tried working around configura=
tions and settings, but the<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; Executor<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; within the ExecutionContext keeps on =
having a parallelism of 8.<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; Although, of<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; course, the parallelism of the execut=
ion environment is much<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; higher (in fact<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; I have many more tasks to be allocate=
d).<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; I feel it may be an issue of the Loca=
lMiniCluster configuration<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; that may<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; just override/neglect my wish for hig=
her degree of parallelism. Is<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; there a<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; way for me to work around this issue?=
<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; please let me know. Thanks a lot for =
you help! :-)<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; saluti,<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;&gt; Stefano<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;&gt; &gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;&gt;<br>
&gt;&gt;&gt;<br>
&gt;&gt;<br>
&gt;<br>
</div></div></blockquote></div><br></div>
</div></div></blockquote></div><div><div dir=3D"ltr"><br><p></p><p></p><p><=
/p><p></p></div></div>
</div></div>

--047d7b5d295cb70911052fe2ac05--