flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Saliya Ekanayake <esal...@gmail.com>
Subject Re: Modifying start-cluster scripts to efficiently spawn multiple TMs
Date Mon, 11 Jul 2016 04:02:05 GMT
pdsh is available in head node only, but when I tried to do
*start-cluster *from head
node (note Job manager node is not head node) it didn't work, which is why
I modified the scripts.

Yes, exactly, this is what I was trying to do. My research area has been on
these NUMA related issues and binding a process to a socket (CPU) and then
its thread to individual cores have shown great advantage. I actually have
Java code that automatically (user configurable as well) bind processes and
threads. For Flink, I've manually done this using  shell script that scans
TMs in a node and pin them appropriately. This approach is OK, but it's
better if the support is integrated to Flink.

On Sun, Jul 10, 2016 at 8:33 PM, Greg Hogan <code@greghogan.com> wrote:

> Hi Saliya,
>
> Would you happen to have pdsh (parallel distributed shell) installed? If
> so the TaskManager startup in start-cluster.sh will run in parallel.
>
> As to running 24 TaskManagers together, are these running across multiple
> NUMA nodes? I had filed FLINK-3163 (
> https://issues.apache.org/jira/browse/FLINK-3163) last year as I have
> seen that even with only two NUMA nodes performance is improved by binding
> TaskManagers, both memory and CPU. I think we can improve configuration of
> task slots as we do with memory, where the latter can be a fixed measure or
> a fraction relative to total memory.
>
> Greg
>
> On Sat, Jul 9, 2016 at 3:44 AM, Saliya Ekanayake <esaliya@gmail.com>
> wrote:
>
>> Hi,
>>
>> The current start/stop scripts SSH worker nodes each time they appear in
>> the slaves file. When spawning multiple TMs (like 24 per node), this is
>> very inefficient.
>>
>> I've changed the scripts to do one SSH per node and spawn a given N
>> number of TMs afterwards. I can make a pull request if this seems usable to
>> others. For now, I assume slaves file will indicate the number of TMs per
>> slave in "IP N" format.
>>
>> Thank you,
>> Saliya
>>
>> --
>> Saliya Ekanayake
>> Ph.D. Candidate | Research Assistant
>> School of Informatics and Computing | Digital Science Center
>> Indiana University, Bloomington
>>
>>
>


-- 
Saliya Ekanayake
Ph.D. Candidate | Research Assistant
School of Informatics and Computing | Digital Science Center
Indiana University, Bloomington

Mime
View raw message