mesos-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Petr <tp...@hubspot.com>
Subject Re: Reconnected slaves not sending resource offers?
Date Mon, 25 Apr 2016 19:50:03 GMT
I0421 21:03:32.014533 17073 hierarchical.hpp:528] Added slave
20151116-203437-35000492-5050-17068-S70 (lively-rice) with mem(*):217609;
cpus(*):210; ports(*):[2048-3048]; disk(*):639829 (allocated: )
I0421 21:03:32.014529 17072 master.cpp:3395] Registered slave
20151116-203437-35000492-5050-17068-S70 at slave(1)@172.16.3.103:5051
(lively-rice) with mem(*):217609; cpus(*):210; ports(*):[2048-3048];
disk(*):639829
I0421 21:03:32.014673 17076 coordinator.cpp:340] Coordinator attempting to
write TRUNCATE action at position 4102
I0421 21:03:32.014945 17069 replica.cpp:511] Replica received write request
for position 4102
I0421 21:03:32.014999 17071 master.cpp:4290] Sending 1 offers to framework
sy3x4 (sy3x4) at
scheduler-6bb2bcf0-d060-4072-a25b-917d8007fb1c@172.16.13.243:56861
I0421 21:03:32.015379 17069 leveldb.cpp:343] Persisting action (18 bytes)
to leveldb took 345429ns
I0421 21:03:32.015403 17069 replica.cpp:679] Persisted action at 4102
I0421 21:03:32.017308 17073 replica.cpp:658] Replica received learned
notice for position 4102
I0421 21:03:32.017627 17073 leveldb.cpp:343] Persisting action (20 bytes)
to leveldb took 292089ns
I0421 21:03:32.017665 17073 leveldb.cpp:401] Deleting ~2 keys from leveldb
took 14004ns
I0421 21:03:32.017681 17073 replica.cpp:679] Persisted action at 4102
I0421 21:03:32.017693 17073 replica.cpp:664] Replica learned TRUNCATE
action at position 4102
I0421 21:03:32.019726 17076 master.cpp:3687] Received update of slave
20151116-203437-35000492-5050-17068-S70 at slave(1)@172.16.3.103:5051
(lively-rice) with total oversubscribed resources
I0421 21:03:32.019800 17076 hierarchical.hpp:588] Slave
20151116-203437-35000492-5050-17068-S70 (lively-rice) updated with
oversubscribed resources  (total: mem(*):217609; cpus(*):210;
ports(*):[2048-3048]; disk(*):639829, allocated: mem(*):217609;
cpus(*):210; ports(*):[2048-3048]; disk(*):639829)

(no other mentions of lively-rice or the slave ID for 10 minutes until we
bounce our scheduler at 21:13...)

I0421 21:13:13.806171 17072 hierarchical.hpp:761] Recovered mem(*):217609;
cpus(*):210; ports(*):[2048-3048]; disk(*):639829 (total: mem(*):217609;
cpus(*):210; ports(*):[2048-3048]; disk(*):639829, allocated: ) on slave
20151116-203437-35000492-5050-17068-S70 from framework sy3x4
I0421 21:13:15.749594 17075 hierarchical.hpp:761] Recovered mem(*):217609;
cpus(*):210; ports(*):[2048-3048]; disk(*):639829 (total: mem(*):217609;
cpus(*):210; ports(*):[2048-3048]; disk(*):639829, allocated: ) on slave
20151116-203437-35000492-5050-17068-S70 from framework sy3x4
I0421 21:14:52.761143 17075 master.cpp:2505] Processing ACCEPT call for
offers: [ 20151116-203437-35000492-5050-17068-O116800466 ] on slave
20151116-203437-35000492-5050-17068-S70 at slave(1)@172.16.3.103:5051 (
lively-rice.iad02.hubspot-networks.net) for framework sy3x4 (sy3x4) at
scheduler-7dda7817-66f1-4b8e-a5dd-9744aea52cba@172.16.40.17:53645

We were originally concerned about the log line at 21:03:32.019800 (where
it says that all the slave's resources were allocated) but I think it's
saying that all the resources on the slave are available as revocable
resources. Am I understanding that correctly?

Thanks,
Tom

On Mon, Apr 25, 2016 at 3:06 PM, Vinod Kone <vinodkone@apache.org> wrote:

>
> On Mon, Apr 25, 2016 at 8:40 AM, Thomas Petr <tpetr@hubspot.com> wrote:
>
>> The only thing that ended up fixing the situation was bouncing our
>> scheduler (~10 minutes after the restarted slaves joined the cluster) --
>> the act of failing over the framework appeared to "recover" the missing
>> resources:
>>
>
> What do the master logs say when the slave is registered with a new id?
>

Mime
View raw message