mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gastón Kleiman (JIRA) <j...@apache.org>
Subject [jira] [Updated] (MESOS-8422) Master's UpdateSlave handler not correctly updating operations
Date Wed, 10 Jan 2018 01:10:00 GMT

     [ https://issues.apache.org/jira/browse/MESOS-8422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gastón Kleiman updated MESOS-8422:
----------------------------------
    Description: 
I created a test that verifies that operation status updates are resent to the master after
being dropped en route to it (MESOS-8420).

The test does the following:

# Creates a volume from a RAW disk resource.
# Drops the first `UpdateOperationStatusMessage` message from the agent to the master, so
that it isn't acknowledged by the master.
# Restarts the agent.
# Verifies that the agent resends the operation status update.

The good news are that the agent is resending the operation status update, the bad news are
that it triggers a CHECK failure that crashes the master.

Here are the relevant sections of the log produced by the test:

{noformat}
[ RUN      ] StorageLocalResourceProviderTest.ROOT_RetryOperationStatusUpdateAfterRecovery
[...]
I0109 16:36:08.515882 24106 master.cpp:4284] Processing ACCEPT call for offers: [ 046b3f21-6e97-4a56-9a13-773f7d481efd-O0
] on agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(2)@10.0.49.2:40681 (core-dev)
for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (default) at scheduler-2a48a684-64b4-4b4d-a396-6491adb4f2b1@10.0.49.2:40681
I0109 16:36:08.516487 24106 master.cpp:5260] Processing CREATE_VOLUME operation with source
disk(allocated: storage)(reservations: [(DYNAMIC,storage)])[RAW(,volume-default)]:4096 from
framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (default) at scheduler-2a48a684-64b4-4b4d-a396-6491adb4f2b1@10.0.49.2:40681
to agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(2)@10.0.49.2:40681 (core-dev)
I0109 16:36:08.518704 24106 master.cpp:10622] Sending operation '' (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408)
to agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(2)@10.0.49.2:40681 (core-dev)
I0109 16:36:08.521210 24130 provider.cpp:504] Received APPLY_OPERATION event
I0109 16:36:08.521276 24130 provider.cpp:1368] Received CREATE_VOLUME operation '' (uuid:
18b4c4a5-d162-4dcf-bb21-a13c6ee0f408)
I0109 16:36:08.523131 24432 test_csi_plugin.cpp:305] CreateVolumeRequest '{"version":{"minor":1},"name":"18b4c4a5-d162-4dcf-bb21-a13c6ee0f408","capacityRange":{"requiredBytes":"4294967296","limitBytes":"4294967296"},"volumeCapabilities":[{"mount":{},"accessMode":{"mode":"SINGLE_NODE_WRITER"}}]}'
I0109 16:36:08.525806 24152 provider.cpp:2635] Applying conversion from 'disk(allocated: storage)(reservations:
[(DYNAMIC,storage)])[RAW(,volume-default)]:4096' to 'disk(allocated: storage)(reservations:
[(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096'
for operation (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408)
I0109 16:36:08.528725 24134 status_update_manager_process.hpp:152] Received operation status
update OPERATION_FINISHED (Status UUID: 0c79cdf2-b89d-453b-bb62-57766e968dd0) for operation
UUID 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 of framework '046b3f21-6e97-4a56-9a13-773f7d481efd-0000'
on agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
I0109 16:36:08.529207 24134 status_update_manager_process.hpp:929] Checkpointing UPDATE for
operation status update OPERATION_FINISHED (Status UUID: 0c79cdf2-b89d-453b-bb62-57766e968dd0)
for operation UUID 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 of framework '046b3f21-6e97-4a56-9a13-773f7d481efd-0000'
on agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
I0109 16:36:08.573177 24150 http.cpp:1185] HTTP POST for /slave(2)/api/v1/resource_provider
from 10.0.49.2:53598
I0109 16:36:08.573974 24139 slave.cpp:7065] Handling resource provider message 'UPDATE_OPERATION_STATUS:
(uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000
(latest state: OPERATION_FINISHED, status update state: OPERATION_FINISHED)'
I0109 16:36:08.574154 24139 slave.cpp:7409] Updating the state of operation ' with no ID (uuid:
18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000
(latest state: OPERATION_FINISHED, status update state: OPERATION_FINISHED)
I0109 16:36:08.574785 24139 slave.cpp:7249] Forwarding status update of operation with no
ID (operation_uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000
I0109 16:36:08.583748 24084 slave.cpp:931] Agent terminating
I0109 16:36:08.584115 24144 master.cpp:1305] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
at slave(2)@10.0.49.2:40681 (core-dev) disconnected
[...]
I0109 16:36:08.655766 24140 slave.cpp:1378] Re-registered with master master@10.0.49.2:40681
I0109 16:36:08.655936 24117 task_status_update_manager.cpp:188] Resuming sending task status
updates
I0109 16:36:08.655995 24149 hierarchical.cpp:669] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
(core-dev) updated with total resources cpus:2; mem:1024; disk:1024; ports:[31000-32000]
I0109 16:36:08.656008 24140 slave.cpp:1423] Forwarding agent update {"operations":{},"resource_version_uuid":{"value":"icuAKyO6TymMt2Y9vyF6Jg=="},"slave_id":{"value":"046b3f21-6e97-4a56-9a13-773f7d481efd-S0"},"update_oversubscribed_resources":true}
I0109 16:36:08.656121 24149 hierarchical.cpp:754] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
reactivated
W0109 16:36:08.656481 24113 master.cpp:7277] !!!! update slave message: slave_id {
  value: "046b3f21-6e97-4a56-9a13-773f7d481efd-S0"
}
update_oversubscribed_resources: true
operations {
}
resource_version_uuid {
  value: "\211\313\200+#\272O)\214\267f=\277!z&"
}
I0109 16:36:08.656637 24113 master.cpp:7320] Received update of agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
at slave(3)@10.0.49.2:40681 (core-dev) with total oversubscribed resources {}
W0109 16:36:08.657387 24113 master.cpp:7704] Performing explicit reconciliation with agent
for known operation 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 since it was not present in original
reconciliation message from agent
I0109 16:36:08.657917 24133 hierarchical.cpp:669] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
(core-dev) updated with total resources cpus:2; mem:1024; disk:1024; ports:[31000-32000]
W0109 16:36:08.658048 24125 manager.cpp:472] Dropping operation reconciliation message with
operation_uuid 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 because resource provider 605b22f5-e39d-4d9f-950a-e7f44d202c01
is not subscribed
I0109 16:36:08.658609 24143 container_daemon.cpp:119] Launching container 'org-apache-mesos-rp-local-storage-test--org-apache-mesos-csi-test-slrp_test--CONTROLLER_SERVICE-NODE_SERVICE'
[...]
I0109 16:36:08.689859 24130 provider.cpp:3066] Sending UPDATE_STATE call with resources 'disk(reservations:
[(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096'
and 1 operations to agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
I0109 16:36:08.690449 24130 provider.cpp:1042] Resource provider 605b22f5-e39d-4d9f-950a-e7f44d202c01
is in READY state
I0109 16:36:08.690491 24105 status_update_manager_process.hpp:385] Resuming operation status
update manager
I0109 16:36:08.690640 24105 status_update_manager_process.hpp:394] Sending operation status
update OPERATION_FINISHED (Status UUID: 0c79cdf2-b89d-453b-bb62-57766e968dd0) for operation
UUID 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 of framework '046b3f21-6e97-4a56-9a13-773f7d481efd-0000'
on agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
I0109 16:36:08.693244 24131 http.cpp:1185] HTTP POST for /slave(3)/api/v1/resource_provider
from 10.0.49.2:53606
I0109 16:36:08.693912 24140 http.cpp:1185] HTTP POST for /slave(3)/api/v1/resource_provider
from 10.0.49.2:53606
I0109 16:36:08.693974 24115 manager.cpp:677] Received UPDATE_STATE call with resources '[{"disk":{"source":{"id":"18b4c4a5-d162-4dcf-bb21-a13c6ee0f408","metadata":{"labels":[{"key":"path","value":"\/tmp\/n5thZ3\/test\/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"}]},"mount":{"root":".\/csi\/org.apache.mesos.csi.test\/slrp_test\/mounts\/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"},"profile":"volume-default","type":"MOUNT"}},"name":"disk","provider_id":{"value":"605b22f5-e39d-4d9f-950a-e7f44d202c01"},"reservations":[{"role":"storage","type":"DYNAMIC"}],"scalar":{"value":4096.0},"type":"SCALAR"}]'
and 1 operations from resource provider 605b22f5-e39d-4d9f-950a-e7f44d202c01
I0109 16:36:08.694897 24144 slave.cpp:7065] Handling resource provider message 'UPDATE_STATE:
605b22f5-e39d-4d9f-950a-e7f44d202c01 disk(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096'
I0109 16:36:08.695184 24144 slave.cpp:7182] Forwarding new total resources cpus:2; mem:1024;
disk:1024; ports:[31000-32000]; disk(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096
I0109 16:36:08.696467 24144 slave.cpp:7065] Handling resource provider message 'UPDATE_OPERATION_STATUS:
(uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000
(latest state: OPERATION_FINISHED, status update state: OPERATION_FINISHED)'
I0109 16:36:08.696594 24144 slave.cpp:7409] Updating the state of operation ' with no ID (uuid:
18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000
(latest state: OPERATION_FINISHED, status update state: OPERATION_FINISHED)
I0109 16:36:08.696666 24144 slave.cpp:7249] Forwarding status update of operation with no
ID (operation_uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000
W0109 16:36:08.697093 24142 master.cpp:7277] !!!! update slave message: slave_id {
  value: "046b3f21-6e97-4a56-9a13-773f7d481efd-S0"
}
update_oversubscribed_resources: false
operations {
}
resource_version_uuid {
  value: "\211\313\200+#\272O)\214\267f=\277!z&"
}
resource_providers {
  providers {
    info {
      id {
        value: "605b22f5-e39d-4d9f-950a-e7f44d202c01"
      }
      type: "org.apache.mesos.rp.local.storage"
      name: "test"
      default_reservations {
        role: "storage"
        type: DYNAMIC
      }
      storage {
        plugin {
          type: "org.apache.mesos.csi.test"
          name: "slrp_test"
          containers {
            [...]
          }
        }
      }
    }
    total_resources {
      name: "disk"
      type: SCALAR
      scalar {
        value: 4096
      }
      disk {
        source {
          type: MOUNT
          mount {
            root: "./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
          }
          id: "18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
          metadata {
            labels {
              key: "path"
              value: "/tmp/n5thZ3/test/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
            }
          }
          profile: "volume-default"
        }
      }
      provider_id {
        value: "605b22f5-e39d-4d9f-950a-e7f44d202c01"
      }
      reservations {
        role: "storage"
        type: DYNAMIC
      }
    }
    operations {
      operations {
        framework_id {
          value: "046b3f21-6e97-4a56-9a13-773f7d481efd-0000"
        }
        slave_id {
          value: "046b3f21-6e97-4a56-9a13-773f7d481efd-S0"
        }
        info {
          type: CREATE_VOLUME
          create_volume {
            source {
              name: "disk"
              type: SCALAR
              scalar {
                value: 4096
              }
              disk {
                source {
                  type: RAW
                  profile: "volume-default"
                }
              }
              allocation_info {
                role: "storage"
              }
              provider_id {
                value: "605b22f5-e39d-4d9f-950a-e7f44d202c01"
              }
              reservations {
                role: "storage"
                type: DYNAMIC
              }
            }
            target_type: MOUNT
          }
        }
        latest_status {
          state: OPERATION_FINISHED
          converted_resources {
            name: "disk"
            type: SCALAR
            scalar {
              value: 4096
            }
            disk {
              source {
                type: MOUNT
                mount {
                  root: "./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
                }
                id: "18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
                metadata {
                  labels {
                    key: "path"
                    value: "/tmp/n5thZ3/test/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
                  }
                }
                profile: "volume-default"
              }
            }
            allocation_info {
              role: "storage"
            }
            provider_id {
              value: "605b22f5-e39d-4d9f-950a-e7f44d202c01"
            }
            reservations {
              role: "storage"
              type: DYNAMIC
            }
          }
          uuid {
            value: "\014y\315\362\270\235E;\273bWvn\226\215\320"
          }
        }
        statuses {
          state: OPERATION_FINISHED
          converted_resources {
            name: "disk"
            type: SCALAR
            scalar {
              value: 4096
            }
            disk {
              source {
                type: MOUNT
                mount {
                  root: "./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
                }
                id: "18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
                metadata {
                  labels {
                    key: "path"
                    value: "/tmp/n5thZ3/test/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
                  }
                }
                profile: "volume-default"
              }
            }
            allocation_info {
              role: "storage"
            }
            provider_id {
              value: "605b22f5-e39d-4d9f-950a-e7f44d202c01"
            }
            reservations {
              role: "storage"
              type: DYNAMIC
            }
          }
          uuid {
            value: "\014y\315\362\270\235E;\273bWvn\226\215\320"
          }
        }
        uuid {
          value: "\030\264\304\245\321bM\317\273!\241<n\340\364\010"
        }
      }
    }
    resource_version_uuid {
      value: "M\250\313j\320\301IG\262\0164e\004\367\304\333"
    }
  }
}
I0109 16:36:08.700137 24142 master.cpp:10411] Updating the state of operation '' (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408)
of framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (latest state: OPERATION_FINISHED,
status update state: OPERATION_FINISHED)
I0109 16:36:08.700417 24146 hierarchical.cpp:669] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
(core-dev) updated with total resources disk(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096;
cpus:2; mem:1024; disk:1024; ports:[31000-32000]
F0109 16:36:08.700610 24142 master.cpp:11687] CHECK_SOME(resources): disk(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096;
cpus:2; mem:1024; disk:1024; ports:[31000-32000] does not contain disk(reservations: [(DYNAMIC,storage)])[RAW(,volume-default)]:4096
*** Check failure stack trace: ***
F0109 16:36:08.700896 24146 hierarchical.cpp:908] CHECK_SOME(updatedTotal): disk(reservations:
[(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096;
cpus:2; mem:1024; disk:1024; ports:[31000-32000] does not contain disk(reservations: [(DYNAMIC,storage)])[RAW(,volume-default)]:4096
*** Check failure stack trace: ***
    @     0x7ff06d3bbe7e  (unknown)
    @     0x7ff06d3bbe7e  (unknown)
    @     0x7ff06d3bbddd  (unknown)
    @     0x7ff06d3bbddd  (unknown)
    @     0x7ff06d3bb7ee  (unknown)
    @     0x7ff06d3bb7ee  (unknown)
    @     0x7ff06d3be522  (unknown)
    @     0x55c1c6c2be77  _ZTSN6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureISt4listIN5mesos5slave13QoSCorrectionESaISF_EEEEEclINS0_IFSI_vEEEEESI_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISH_EESt14default_deleteISU_EEOSM_S3_E_ISX_SM_St12_PlaceholderILi1EEEEEEE
    @     0x7ff06d3be522  (unknown)
    @     0x55c1c6c2be77  _ZTSN6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureISt4listIN5mesos5slave13QoSCorrectionESaISF_EEEEEclINS0_IFSI_vEEEEESI_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISH_EESt14default_deleteISU_EEOSM_S3_E_ISX_SM_St12_PlaceholderILi1EEEEEEE
    @     0x7ff06b729277  (unknown)
    @     0x55c1c6f3be8a  _ZTSN6lambda12CallableOnceIFvRK6ResultIN5mesos2v117resource_provider5EventEEEE10CallableFnINS_8internal7PartialIZNK7process6FutureIS6_E7onReadyISt5_BindIFSt7_Mem_fnIMSG_FbS8_EESG_St12_PlaceholderILi1EEEEbEERKSG_OT_NSG_6PreferEEUlOSQ_S8_E_ISQ_SO_EEEEE
{noformat}

We can see that once the SLRP reregisters with the agent, the following happens:

# The agent will send an {{UpdateSlave}} message to the master including the converted resources
and the {{CREATE_VOLUME}} operation with the status {{OPERATION_FINISHED}}.
# The master will update the agent's resources, including the volume created by the operation.
# The agent will resend the operation status update.
# The master will try to apply the operation and crash, because it updated the agent's resources
on step #2.

> Master's UpdateSlave handler not correctly updating operations
> --------------------------------------------------------------
>
>                 Key: MESOS-8422
>                 URL: https://issues.apache.org/jira/browse/MESOS-8422
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Gastón Kleiman
>              Labels: mesosphere
>
> I created a test that verifies that operation status updates are resent to the master
after being dropped en route to it (MESOS-8420).
> The test does the following:
> # Creates a volume from a RAW disk resource.
> # Drops the first `UpdateOperationStatusMessage` message from the agent to the master,
so that it isn't acknowledged by the master.
> # Restarts the agent.
> # Verifies that the agent resends the operation status update.
> The good news are that the agent is resending the operation status update, the bad news
are that it triggers a CHECK failure that crashes the master.
> Here are the relevant sections of the log produced by the test:
> {noformat}
> [ RUN      ] StorageLocalResourceProviderTest.ROOT_RetryOperationStatusUpdateAfterRecovery
> [...]
> I0109 16:36:08.515882 24106 master.cpp:4284] Processing ACCEPT call for offers: [ 046b3f21-6e97-4a56-9a13-773f7d481efd-O0
] on agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(2)@10.0.49.2:40681 (core-dev)
for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (default) at scheduler-2a48a684-64b4-4b4d-a396-6491adb4f2b1@10.0.49.2:40681
> I0109 16:36:08.516487 24106 master.cpp:5260] Processing CREATE_VOLUME operation with
source disk(allocated: storage)(reservations: [(DYNAMIC,storage)])[RAW(,volume-default)]:4096
from framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000 (default) at scheduler-2a48a684-64b4-4b4d-a396-6491adb4f2b1@10.0.49.2:40681
to agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(2)@10.0.49.2:40681 (core-dev)
> I0109 16:36:08.518704 24106 master.cpp:10622] Sending operation '' (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408)
to agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0 at slave(2)@10.0.49.2:40681 (core-dev)
> I0109 16:36:08.521210 24130 provider.cpp:504] Received APPLY_OPERATION event
> I0109 16:36:08.521276 24130 provider.cpp:1368] Received CREATE_VOLUME operation '' (uuid:
18b4c4a5-d162-4dcf-bb21-a13c6ee0f408)
> I0109 16:36:08.523131 24432 test_csi_plugin.cpp:305] CreateVolumeRequest '{"version":{"minor":1},"name":"18b4c4a5-d162-4dcf-bb21-a13c6ee0f408","capacityRange":{"requiredBytes":"4294967296","limitBytes":"4294967296"},"volumeCapabilities":[{"mount":{},"accessMode":{"mode":"SINGLE_NODE_WRITER"}}]}'
> I0109 16:36:08.525806 24152 provider.cpp:2635] Applying conversion from 'disk(allocated:
storage)(reservations: [(DYNAMIC,storage)])[RAW(,volume-default)]:4096' to 'disk(allocated:
storage)(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096'
for operation (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408)
> I0109 16:36:08.528725 24134 status_update_manager_process.hpp:152] Received operation
status update OPERATION_FINISHED (Status UUID: 0c79cdf2-b89d-453b-bb62-57766e968dd0) for operation
UUID 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 of framework '046b3f21-6e97-4a56-9a13-773f7d481efd-0000'
on agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
> I0109 16:36:08.529207 24134 status_update_manager_process.hpp:929] Checkpointing UPDATE
for operation status update OPERATION_FINISHED (Status UUID: 0c79cdf2-b89d-453b-bb62-57766e968dd0)
for operation UUID 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 of framework '046b3f21-6e97-4a56-9a13-773f7d481efd-0000'
on agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
> I0109 16:36:08.573177 24150 http.cpp:1185] HTTP POST for /slave(2)/api/v1/resource_provider
from 10.0.49.2:53598
> I0109 16:36:08.573974 24139 slave.cpp:7065] Handling resource provider message 'UPDATE_OPERATION_STATUS:
(uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000
(latest state: OPERATION_FINISHED, status update state: OPERATION_FINISHED)'
> I0109 16:36:08.574154 24139 slave.cpp:7409] Updating the state of operation ' with no
ID (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000
(latest state: OPERATION_FINISHED, status update state: OPERATION_FINISHED)
> I0109 16:36:08.574785 24139 slave.cpp:7249] Forwarding status update of operation with
no ID (operation_uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000
> I0109 16:36:08.583748 24084 slave.cpp:931] Agent terminating
> I0109 16:36:08.584115 24144 master.cpp:1305] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
at slave(2)@10.0.49.2:40681 (core-dev) disconnected
> [...]
> I0109 16:36:08.655766 24140 slave.cpp:1378] Re-registered with master master@10.0.49.2:40681
> I0109 16:36:08.655936 24117 task_status_update_manager.cpp:188] Resuming sending task
status updates
> I0109 16:36:08.655995 24149 hierarchical.cpp:669] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
(core-dev) updated with total resources cpus:2; mem:1024; disk:1024; ports:[31000-32000]
> I0109 16:36:08.656008 24140 slave.cpp:1423] Forwarding agent update {"operations":{},"resource_version_uuid":{"value":"icuAKyO6TymMt2Y9vyF6Jg=="},"slave_id":{"value":"046b3f21-6e97-4a56-9a13-773f7d481efd-S0"},"update_oversubscribed_resources":true}
> I0109 16:36:08.656121 24149 hierarchical.cpp:754] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
reactivated
> W0109 16:36:08.656481 24113 master.cpp:7277] !!!! update slave message: slave_id {
>   value: "046b3f21-6e97-4a56-9a13-773f7d481efd-S0"
> }
> update_oversubscribed_resources: true
> operations {
> }
> resource_version_uuid {
>   value: "\211\313\200+#\272O)\214\267f=\277!z&"
> }
> I0109 16:36:08.656637 24113 master.cpp:7320] Received update of agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
at slave(3)@10.0.49.2:40681 (core-dev) with total oversubscribed resources {}
> W0109 16:36:08.657387 24113 master.cpp:7704] Performing explicit reconciliation with
agent for known operation 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 since it was not present in
original reconciliation message from agent
> I0109 16:36:08.657917 24133 hierarchical.cpp:669] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
(core-dev) updated with total resources cpus:2; mem:1024; disk:1024; ports:[31000-32000]
> W0109 16:36:08.658048 24125 manager.cpp:472] Dropping operation reconciliation message
with operation_uuid 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 because resource provider 605b22f5-e39d-4d9f-950a-e7f44d202c01
is not subscribed
> I0109 16:36:08.658609 24143 container_daemon.cpp:119] Launching container 'org-apache-mesos-rp-local-storage-test--org-apache-mesos-csi-test-slrp_test--CONTROLLER_SERVICE-NODE_SERVICE'
> [...]
> I0109 16:36:08.689859 24130 provider.cpp:3066] Sending UPDATE_STATE call with resources
'disk(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096'
and 1 operations to agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
> I0109 16:36:08.690449 24130 provider.cpp:1042] Resource provider 605b22f5-e39d-4d9f-950a-e7f44d202c01
is in READY state
> I0109 16:36:08.690491 24105 status_update_manager_process.hpp:385] Resuming operation
status update manager
> I0109 16:36:08.690640 24105 status_update_manager_process.hpp:394] Sending operation
status update OPERATION_FINISHED (Status UUID: 0c79cdf2-b89d-453b-bb62-57766e968dd0) for operation
UUID 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408 of framework '046b3f21-6e97-4a56-9a13-773f7d481efd-0000'
on agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
> I0109 16:36:08.693244 24131 http.cpp:1185] HTTP POST for /slave(3)/api/v1/resource_provider
from 10.0.49.2:53606
> I0109 16:36:08.693912 24140 http.cpp:1185] HTTP POST for /slave(3)/api/v1/resource_provider
from 10.0.49.2:53606
> I0109 16:36:08.693974 24115 manager.cpp:677] Received UPDATE_STATE call with resources
'[{"disk":{"source":{"id":"18b4c4a5-d162-4dcf-bb21-a13c6ee0f408","metadata":{"labels":[{"key":"path","value":"\/tmp\/n5thZ3\/test\/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"}]},"mount":{"root":".\/csi\/org.apache.mesos.csi.test\/slrp_test\/mounts\/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"},"profile":"volume-default","type":"MOUNT"}},"name":"disk","provider_id":{"value":"605b22f5-e39d-4d9f-950a-e7f44d202c01"},"reservations":[{"role":"storage","type":"DYNAMIC"}],"scalar":{"value":4096.0},"type":"SCALAR"}]'
and 1 operations from resource provider 605b22f5-e39d-4d9f-950a-e7f44d202c01
> I0109 16:36:08.694897 24144 slave.cpp:7065] Handling resource provider message 'UPDATE_STATE:
605b22f5-e39d-4d9f-950a-e7f44d202c01 disk(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096'
> I0109 16:36:08.695184 24144 slave.cpp:7182] Forwarding new total resources cpus:2; mem:1024;
disk:1024; ports:[31000-32000]; disk(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096
> I0109 16:36:08.696467 24144 slave.cpp:7065] Handling resource provider message 'UPDATE_OPERATION_STATUS:
(uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000
(latest state: OPERATION_FINISHED, status update state: OPERATION_FINISHED)'
> I0109 16:36:08.696594 24144 slave.cpp:7409] Updating the state of operation ' with no
ID (uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000
(latest state: OPERATION_FINISHED, status update state: OPERATION_FINISHED)
> I0109 16:36:08.696666 24144 slave.cpp:7249] Forwarding status update of operation with
no ID (operation_uuid: 18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) for framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000
> W0109 16:36:08.697093 24142 master.cpp:7277] !!!! update slave message: slave_id {
>   value: "046b3f21-6e97-4a56-9a13-773f7d481efd-S0"
> }
> update_oversubscribed_resources: false
> operations {
> }
> resource_version_uuid {
>   value: "\211\313\200+#\272O)\214\267f=\277!z&"
> }
> resource_providers {
>   providers {
>     info {
>       id {
>         value: "605b22f5-e39d-4d9f-950a-e7f44d202c01"
>       }
>       type: "org.apache.mesos.rp.local.storage"
>       name: "test"
>       default_reservations {
>         role: "storage"
>         type: DYNAMIC
>       }
>       storage {
>         plugin {
>           type: "org.apache.mesos.csi.test"
>           name: "slrp_test"
>           containers {
>             [...]
>           }
>         }
>       }
>     }
>     total_resources {
>       name: "disk"
>       type: SCALAR
>       scalar {
>         value: 4096
>       }
>       disk {
>         source {
>           type: MOUNT
>           mount {
>             root: "./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
>           }
>           id: "18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
>           metadata {
>             labels {
>               key: "path"
>               value: "/tmp/n5thZ3/test/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
>             }
>           }
>           profile: "volume-default"
>         }
>       }
>       provider_id {
>         value: "605b22f5-e39d-4d9f-950a-e7f44d202c01"
>       }
>       reservations {
>         role: "storage"
>         type: DYNAMIC
>       }
>     }
>     operations {
>       operations {
>         framework_id {
>           value: "046b3f21-6e97-4a56-9a13-773f7d481efd-0000"
>         }
>         slave_id {
>           value: "046b3f21-6e97-4a56-9a13-773f7d481efd-S0"
>         }
>         info {
>           type: CREATE_VOLUME
>           create_volume {
>             source {
>               name: "disk"
>               type: SCALAR
>               scalar {
>                 value: 4096
>               }
>               disk {
>                 source {
>                   type: RAW
>                   profile: "volume-default"
>                 }
>               }
>               allocation_info {
>                 role: "storage"
>               }
>               provider_id {
>                 value: "605b22f5-e39d-4d9f-950a-e7f44d202c01"
>               }
>               reservations {
>                 role: "storage"
>                 type: DYNAMIC
>               }
>             }
>             target_type: MOUNT
>           }
>         }
>         latest_status {
>           state: OPERATION_FINISHED
>           converted_resources {
>             name: "disk"
>             type: SCALAR
>             scalar {
>               value: 4096
>             }
>             disk {
>               source {
>                 type: MOUNT
>                 mount {
>                   root: "./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
>                 }
>                 id: "18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
>                 metadata {
>                   labels {
>                     key: "path"
>                     value: "/tmp/n5thZ3/test/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
>                   }
>                 }
>                 profile: "volume-default"
>               }
>             }
>             allocation_info {
>               role: "storage"
>             }
>             provider_id {
>               value: "605b22f5-e39d-4d9f-950a-e7f44d202c01"
>             }
>             reservations {
>               role: "storage"
>               type: DYNAMIC
>             }
>           }
>           uuid {
>             value: "\014y\315\362\270\235E;\273bWvn\226\215\320"
>           }
>         }
>         statuses {
>           state: OPERATION_FINISHED
>           converted_resources {
>             name: "disk"
>             type: SCALAR
>             scalar {
>               value: 4096
>             }
>             disk {
>               source {
>                 type: MOUNT
>                 mount {
>                   root: "./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
>                 }
>                 id: "18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
>                 metadata {
>                   labels {
>                     key: "path"
>                     value: "/tmp/n5thZ3/test/4GB-18b4c4a5-d162-4dcf-bb21-a13c6ee0f408"
>                   }
>                 }
>                 profile: "volume-default"
>               }
>             }
>             allocation_info {
>               role: "storage"
>             }
>             provider_id {
>               value: "605b22f5-e39d-4d9f-950a-e7f44d202c01"
>             }
>             reservations {
>               role: "storage"
>               type: DYNAMIC
>             }
>           }
>           uuid {
>             value: "\014y\315\362\270\235E;\273bWvn\226\215\320"
>           }
>         }
>         uuid {
>           value: "\030\264\304\245\321bM\317\273!\241<n\340\364\010"
>         }
>       }
>     }
>     resource_version_uuid {
>       value: "M\250\313j\320\301IG\262\0164e\004\367\304\333"
>     }
>   }
> }
> I0109 16:36:08.700137 24142 master.cpp:10411] Updating the state of operation '' (uuid:
18b4c4a5-d162-4dcf-bb21-a13c6ee0f408) of framework 046b3f21-6e97-4a56-9a13-773f7d481efd-0000
(latest state: OPERATION_FINISHED, status update state: OPERATION_FINISHED)
> I0109 16:36:08.700417 24146 hierarchical.cpp:669] Agent 046b3f21-6e97-4a56-9a13-773f7d481efd-S0
(core-dev) updated with total resources disk(reservations: [(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096;
cpus:2; mem:1024; disk:1024; ports:[31000-32000]
> F0109 16:36:08.700610 24142 master.cpp:11687] CHECK_SOME(resources): disk(reservations:
[(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096;
cpus:2; mem:1024; disk:1024; ports:[31000-32000] does not contain disk(reservations: [(DYNAMIC,storage)])[RAW(,volume-default)]:4096
> *** Check failure stack trace: ***
> F0109 16:36:08.700896 24146 hierarchical.cpp:908] CHECK_SOME(updatedTotal): disk(reservations:
[(DYNAMIC,storage)])[MOUNT(18b4c4a5-d162-4dcf-bb21-a13c6ee0f408,volume-default):./csi/org.apache.mesos.csi.test/slrp_test/mounts/18b4c4a5-d162-4dcf-bb21-a13c6ee0f408]:4096;
cpus:2; mem:1024; disk:1024; ports:[31000-32000] does not contain disk(reservations: [(DYNAMIC,storage)])[RAW(,volume-default)]:4096
> *** Check failure stack trace: ***
>     @     0x7ff06d3bbe7e  (unknown)
>     @     0x7ff06d3bbe7e  (unknown)
>     @     0x7ff06d3bbddd  (unknown)
>     @     0x7ff06d3bbddd  (unknown)
>     @     0x7ff06d3bb7ee  (unknown)
>     @     0x7ff06d3bb7ee  (unknown)
>     @     0x7ff06d3be522  (unknown)
>     @     0x55c1c6c2be77  _ZTSN6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureISt4listIN5mesos5slave13QoSCorrectionESaISF_EEEEEclINS0_IFSI_vEEEEESI_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISH_EESt14default_deleteISU_EEOSM_S3_E_ISX_SM_St12_PlaceholderILi1EEEEEEE
>     @     0x7ff06d3be522  (unknown)
>     @     0x55c1c6c2be77  _ZTSN6lambda12CallableOnceIFvPN7process11ProcessBaseEEE10CallableFnINS_8internal7PartialIZNS1_8internal8DispatchINS1_6FutureISt4listIN5mesos5slave13QoSCorrectionESaISF_EEEEEclINS0_IFSI_vEEEEESI_RKNS1_4UPIDEOT_EUlSt10unique_ptrINS1_7PromiseISH_EESt14default_deleteISU_EEOSM_S3_E_ISX_SM_St12_PlaceholderILi1EEEEEEE
>     @     0x7ff06b729277  (unknown)
>     @     0x55c1c6f3be8a  _ZTSN6lambda12CallableOnceIFvRK6ResultIN5mesos2v117resource_provider5EventEEEE10CallableFnINS_8internal7PartialIZNK7process6FutureIS6_E7onReadyISt5_BindIFSt7_Mem_fnIMSG_FbS8_EESG_St12_PlaceholderILi1EEEEbEERKSG_OT_NSG_6PreferEEUlOSQ_S8_E_ISQ_SO_EEEEE
> {noformat}
> We can see that once the SLRP reregisters with the agent, the following happens:
> # The agent will send an {{UpdateSlave}} message to the master including the converted
resources and the {{CREATE_VOLUME}} operation with the status {{OPERATION_FINISHED}}.
> # The master will update the agent's resources, including the volume created by the operation.
> # The agent will resend the operation status update.
> # The master will try to apply the operation and crash, because it updated the agent's
resources on step #2.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Mime
View raw message