hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabio C." <anyte...@gmail.com>
Subject SLS in deadlock with synthetic traces?
Date Sat, 20 Dec 2014 10:54:57 GMT
Hi everyone,
as part of a master degree thesis I am running some synthetic traces in
SLS. I'm simulating a single queue and a single node with 128 vcores and
128GB of RAM.

As the workload contains 120 jobs, the simulation completes. Then I have to
scale, and the new workload contains 1200 jobs, run by 12 users (as before).

As you can see in the attached pdf, at first everything goes fine,
resources are allocated and deallocated. Then for some reason the number of
allocated containers becomes the same of running applications, and from
this point nothing changes, the simulation continues forever (after I took
this screenshot, just the jvm.free.memory kept decreasing).

I need your help to understand what happens, even because these traces were
apparently getting to a final point when run by another student (I'm even
using the same VM!), the only difference is that my laptop sucks, but it
sounds strange that this can influence the simulator internal behavior (or
is it so since SLS wrap the real RM?).
What I think is shown in the charts, is that SLS tries to keep the launch
time of the applications found in the input trace, and since the simulated
execution time is actually bigger than the one in the traces, AMs keep
accumulating, up to the point they can just have the container to run
themselves, and no resource is available nor released by anyone...
deadlock. Another evidence of this is also in
variable.running.application.csv, where at some point the running apps are
as many as the available containers (128). But is this what really happens
and how SLS really works?

I attach also the synthetic trace and the (very partial) result I get
before killing the process. Maybe someone could try to run it and check in
the web interface (http://SLSaddress:10001) if the behavior is the same...

Thanks to anyone who could give me any hint


View raw message