mesos-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinod Kone (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MESOS-2627) ExamplesTest.PersistentVolumeFramework is flaky
Date Fri, 17 Apr 2015 22:59:00 GMT

     [ https://issues.apache.org/jira/browse/MESOS-2627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Vinod Kone updated MESOS-2627:
------------------------------
    Summary: ExamplesTest.PersistentVolumeFramework is flaky  (was: ExamplesTest.PersistentVolumeFramework
is flaky on OS X)

Happened on CentOS too.

https://builds.apache.org/job/Mesos/COMPILER=clang,LABEL=docker%7C%7CHadoop,OS=ubuntu:14.10/154/consoleFull

{code}
[ RUN      ] ExamplesTest.PersistentVolumeFramework
Using temporary directory '/tmp/ExamplesTest_PersistentVolumeFramework_Sfw1QM'
I0417 22:48:04.794379 24154 logging.cpp:177] Logging to STDERR
I0417 22:48:04.806035 24154 process.cpp:912] libprocess is initialized on 172.17.0.62:55041
for 16 cpus
I0417 22:48:04.814960 24154 leveldb.cpp:176] Opened db in 3.882231ms
I0417 22:48:04.816679 24154 leveldb.cpp:183] Compacted db in 1.689813ms
I0417 22:48:04.816825 24154 leveldb.cpp:198] Created db iterator in 105015ns
I0417 22:48:04.816859 24154 leveldb.cpp:204] Seeked to beginning of db in 14744ns
I0417 22:48:04.816872 24154 leveldb.cpp:273] Iterated through 0 keys in the db in 8275ns
I0417 22:48:04.817123 24154 replica.cpp:744] Replica recovered with log positions 0 ->
0 with 1 holes and 0 unlearned
I0417 22:48:04.819927 24182 recover.cpp:449] Starting replica recovery
I0417 22:48:04.821547 24182 recover.cpp:475] Replica is in EMPTY status
I0417 22:48:04.824378 24191 master.cpp:361] Master 20150417-224804-1040191916-55041-24154
(7630717d513f) started on 172.17.0.62:55041
I0417 22:48:04.824446 24191 master.cpp:363] Flags at startup: --acls="permissive: true
register_frameworks {
  principals {
    type: ANY
  }
  roles {
    type: SOME
    values: "test"
  }
}
" --allocation_interval="1secs" --authenticate="false" --authenticate_slaves="false" --authenticators="crammd5"
--credentials="/tmp/ExamplesTest_PersistentVolumeFramework_Sfw1QM/credentials" --framework_sorter="drf"
--initialize_driver_logging="true" --log_auto_initialize="true" --logbufsecs="0" --logging_level="INFO"
--quiet="false" --recovery_slave_removal_limit="100%" --registry="replicated_log" --registry_fetch_timeout="1mins"
--registry_store_timeout="5secs" --registry_strict="false" --roles="test" --root_submissions="true"
--slave_reregister_timeout="10mins" --user_sorter="drf" --version="false" --webui_dir="/mesos/mesos-0.23.0/src/webui"
--work_dir="/tmp/mesos-tmgVmD" --zk_session_timeout="10secs"
I0417 22:48:04.825870 24191 master.cpp:410] Master allowing unauthenticated frameworks to
register
I0417 22:48:04.825896 24191 master.cpp:415] Master allowing unauthenticated slaves to register
I0417 22:48:04.825912 24191 credentials.hpp:37] Loading credentials for authentication from
'/tmp/ExamplesTest_PersistentVolumeFramework_Sfw1QM/credentials'
I0417 22:48:04.825928 24187 replica.cpp:641] Replica in EMPTY status received a broadcasted
recover request
W0417 22:48:04.826073 24191 credentials.hpp:52] Permissions on credentials file '/tmp/ExamplesTest_PersistentVolumeFramework_Sfw1QM/credentials'
are too open. It is recommended that your credentials file is NOT accessible by others.
I0417 22:48:04.826552 24191 master.cpp:452] Using default 'crammd5' authenticator
I0417 22:48:04.826997 24191 authenticator.cpp:511] Initializing server SASL
I0417 22:48:04.827725 24154 containerizer.cpp:113] Using isolation: posix/cpu,posix/mem
I0417 22:48:04.828730 24192 recover.cpp:195] Received a recover response from a replica in
EMPTY status
I0417 22:48:04.829848 24179 recover.cpp:566] Updating replica status to STARTING
I0417 22:48:04.830972 24189 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took
908555ns
I0417 22:48:04.831010 24191 auxprop.cpp:65] Initialized in-memory auxiliary property plugin
I0417 22:48:04.831030 24189 replica.cpp:323] Persisted replica status to STARTING
I0417 22:48:04.831171 24191 master.cpp:489] Authorization enabled
I0417 22:48:04.831217 24180 slave.cpp:176] Slave started on 1)@172.17.0.62:55041
I0417 22:48:04.831496 24186 recover.cpp:475] Replica is in STARTING status
I0417 22:48:04.831269 24180 slave.cpp:177] Flags at startup: --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false"
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
--cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="test"
--disk_watch_interval="1mins" --docker="docker" --docker_remove_delay="6hrs" --docker_sandbox_directory="/mnt/mesos/sandbox"
--docker_stop_timeout="0ns" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
--hadoop_home="" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/mesos/mesos-0.23.0/_build/src/.libs"
--logbufsecs="0" --logging_level="INFO" --perf_duration="10secs" --perf_interval="1mins" --quiet="false"
--recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --resource_monitoring_interval="1secs"
--strict="true" --switch_user="true" --version="false" --work_dir="/tmp/mesos-tmgVmD/0"
I0417 22:48:04.832581 24180 slave.cpp:316] Slave resources: cpus(test):16; mem(test):47270;
disk(test):3.70122e+06; ports(test):[31000-32000]
I0417 22:48:04.832710 24154 containerizer.cpp:113] Using isolation: posix/cpu,posix/mem
I0417 22:48:04.832805 24180 slave.cpp:346] Slave hostname: 7630717d513f
I0417 22:48:04.832826 24180 slave.cpp:351] Slave checkpoint: true
I0417 22:48:04.832828 24186 replica.cpp:641] Replica in STARTING status received a broadcasted
recover request
I0417 22:48:04.832908 24190 whitelist_watcher.cpp:79] No whitelist given
I0417 22:48:04.833047 24187 recover.cpp:195] Received a recover response from a replica in
STARTING status
I0417 22:48:04.833511 24181 recover.cpp:566] Updating replica status to VOTING
I0417 22:48:04.833739 24177 hierarchical.hpp:287] Initialized hierarchical allocator process
I0417 22:48:04.834203 24182 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took
550421ns
I0417 22:48:04.834228 24182 replica.cpp:323] Persisted replica status to VOTING
I0417 22:48:04.834250 24183 slave.cpp:176] Slave started on 2)@172.17.0.62:55041
I0417 22:48:04.834328 24181 recover.cpp:580] Successfully joined the Paxos group
I0417 22:48:04.834290 24183 slave.cpp:177] Flags at startup: --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false"
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
--cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="test"
--disk_watch_interval="1mins" --docker="docker" --docker_remove_delay="6hrs" --docker_sandbox_directory="/mnt/mesos/sandbox"
--docker_stop_timeout="0ns" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
--hadoop_home="" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/mesos/mesos-0.23.0/_build/src/.libs"
--logbufsecs="0" --logging_level="INFO" --perf_duration="10secs" --perf_interval="1mins" --quiet="false"
--recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --resource_monitoring_interval="1secs"
--strict="true" --switch_user="true" --version="false" --work_dir="/tmp/mesos-tmgVmD/1"
I0417 22:48:04.834744 24181 recover.cpp:464] Recover process terminated
I0417 22:48:04.835010 24183 slave.cpp:316] Slave resources: cpus(test):16; mem(test):47270;
disk(test):3.70122e+06; ports(test):[31000-32000]
I0417 22:48:04.835120 24183 slave.cpp:346] Slave hostname: 7630717d513f
I0417 22:48:04.835136 24183 slave.cpp:351] Slave checkpoint: true
I0417 22:48:04.836086 24154 containerizer.cpp:113] Using isolation: posix/cpu,posix/mem
I0417 22:48:04.837262 24178 slave.cpp:176] Slave started on 3)@172.17.0.62:55041
I0417 22:48:04.837357 24178 slave.cpp:177] Flags at startup: --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false"
--cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false"
--cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="test"
--disk_watch_interval="1mins" --docker="docker" --docker_remove_delay="6hrs" --docker_sandbox_directory="/mnt/mesos/sandbox"
--docker_stop_timeout="0ns" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins"
--executor_shutdown_grace_period="5secs" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1"
--hadoop_home="" --initialize_driver_logging="true" --isolation="posix/cpu,posix/mem" --launcher_dir="/mesos/mesos-0.23.0/_build/src/.libs"
--logbufsecs="0" --logging_level="INFO" --perf_duration="10secs" --perf_interval="1mins" --quiet="false"
--recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --resource_monitoring_interval="1secs"
--strict="true" --switch_user="true" --version="false" --work_dir="/tmp/mesos-tmgVmD/2"
I0417 22:48:04.838080 24178 slave.cpp:316] Slave resources: cpus(test):16; mem(test):47270;
disk(test):3.70122e+06; ports(test):[31000-32000]
I0417 22:48:04.838181 24178 slave.cpp:346] Slave hostname: 7630717d513f
I0417 22:48:04.838196 24178 slave.cpp:351] Slave checkpoint: true
*** Error in `/mesos/mesos-0.23.0/_build/src/.libs/lt-persistent-volume-framework': double
free or corruption (fasttop): 0x00007f90ec00d460 ***
*** Aborted at 1429310884 (unix time) try "date -d @1429310884" if you are using GNU date
***
I0417 22:48:04.839388 24185 state.cpp:35] Recovering state from '/tmp/mesos-tmgVmD/0/meta'
I0417 22:48:04.839416 24187 state.cpp:35] Recovering state from '/tmp/mesos-tmgVmD/2/meta'
I0417 22:48:04.840162 24154 sched.cpp:157] Version: 0.23.0
I0417 22:48:04.840227 24186 status_update_manager.cpp:197] Recovering status update manager
I0417 22:48:04.840237 24184 status_update_manager.cpp:197] Recovering status update manager
PC: @     0x7f9125dc6e37 (unknown)
I0417 22:48:04.840636 24177 containerizer.cpp:310] Recovering containerizer
I0417 22:48:04.840630 24184 containerizer.cpp:310] Recovering containerizer
I0417 22:48:04.840785 24188 master.cpp:1422] The newly elected leader is master@172.17.0.62:55041
with id 20150417-224804-1040191916-55041-24154
I0417 22:48:04.840895 24188 master.cpp:1435] Elected as the leading master!
I0417 22:48:04.840919 24188 master.cpp:1207] Recovering from registrar
I0417 22:48:04.841101 24185 registrar.cpp:313] Recovering registrar
I0417 22:48:04.842108 24178 sched.cpp:254] New master detected at master@172.17.0.62:55041
I0417 22:48:04.842408 24189 slave.cpp:3806] Finished recovery
I0417 22:48:04.842409 24190 slave.cpp:3806] Finished recovery
I0417 22:48:04.842586 24178 sched.cpp:264] No credentials provided. Attempting to register
without authentication
I0417 22:48:04.842603 24181 log.cpp:660] Attempting to start the writer
I0417 22:48:04.842613 24178 sched.cpp:521] Sending registration request to master@172.17.0.62:55041
I0417 22:48:04.842722 24178 sched.cpp:554] Will retry registration in 32.354951ms if necessary
I0417 22:48:04.842785 24177 master.cpp:1034] Dropping 'mesos.internal.RegisterFrameworkMessage'
message since not recovered yet
I0417 22:48:04.843227 24188 status_update_manager.cpp:171] Pausing sending status updates
I0417 22:48:04.843228 24189 slave.cpp:627] New master detected at master@172.17.0.62:55041
I0417 22:48:04.843364 24189 slave.cpp:652] No credentials provided. Attempting to register
without authentication
I0417 22:48:04.843446 24189 slave.cpp:663] Detecting new master
I0417 22:48:04.843808 24185 status_update_manager.cpp:171] Pausing sending status updates
I0417 22:48:04.843821 24190 slave.cpp:627] New master detected at master@172.17.0.62:55041
I0417 22:48:04.843879 24190 slave.cpp:652] No credentials provided. Attempting to register
without authentication
I0417 22:48:04.843911 24190 slave.cpp:663] Detecting new master
I0417 22:48:04.845938 24182 replica.cpp:477] Replica received implicit promise request with
proposal 1
I0417 22:48:04.846547 24182 leveldb.cpp:306] Persisting metadata (8 bytes) to leveldb took
591136ns
I0417 22:48:04.846586 24182 replica.cpp:345] Persisted promised to 1
I0417 22:48:04.847851 24177 coordinator.cpp:230] Coordinator attemping to fill missing position
I0417 22:48:04.849622 24185 replica.cpp:378] Replica received explicit promise request for
position 0 with proposal 2
I0417 22:48:04.850231 24185 leveldb.cpp:343] Persisting action (8 bytes) to leveldb took 568859ns
I0417 22:48:04.850258 24185 replica.cpp:679] Persisted action at 0
I0417 22:48:04.851675 24191 replica.cpp:511] Replica received write request for position 0
I0417 22:48:04.851797 24191 leveldb.cpp:438] Reading position from leveldb took 96390ns
I0417 22:48:04.852370 24191 leveldb.cpp:343] Persisting action (14 bytes) to leveldb took
518618ns
I0417 22:48:04.852393 24191 replica.cpp:679] Persisted action at 0
I0417 22:48:04.853102 24190 replica.cpp:658] Replica received learned notice for position
0
I0417 22:48:04.853556 24190 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took
442980ns
I0417 22:48:04.853580 24190 replica.cpp:679] Persisted action at 0
I0417 22:48:04.853600 24190 replica.cpp:664] Replica learned NOP action at position 0
I0417 22:48:04.854176 24182 log.cpp:676] Writer started with ending position 0
I0417 22:48:04.857132 24185 leveldb.cpp:438] Reading position from leveldb took 82598ns
*** SIGABRT (@0x3e800005e5a) received by PID 24154 (TID 0x7f911c141700) from PID 24154; stack
trace: ***
I0417 22:48:04.861847 24190 registrar.cpp:346] Successfully fetched the registry (0B) in 20.683264ms
I0417 22:48:04.862339 24190 registrar.cpp:445] Applied 1 operations in 190192ns; attempting
to update the 'registry'
    @     0x7f9126163c90 (unknown)
    @     0x7f9125dc6e37 (unknown)
    @     0x7f9125dc8528 (unknown)
I0417 22:48:04.866438 24188 log.cpp:684] Attempting to append 130 bytes to the log
I0417 22:48:04.866677 24184 coordinator.cpp:340] Coordinator attempting to write APPEND action
at position 1
    @     0x7f9125e07f74 (unknown)
I0417 22:48:04.867557 24182 replica.cpp:511] Replica received write request for position 1
    @     0x7f9125e0f586 (unknown)
I0417 22:48:04.868192 24182 leveldb.cpp:343] Persisting action (149 bytes) to leveldb took
604955ns
I0417 22:48:04.868221 24182 replica.cpp:679] Persisted action at 1
I0417 22:48:04.868923 24192 replica.cpp:658] Replica received learned notice for position
1
I0417 22:48:04.869328 24192 leveldb.cpp:343] Persisting action (151 bytes) to leveldb took
389604ns
I0417 22:48:04.869360 24192 replica.cpp:679] Persisted action at 1
I0417 22:48:04.869400 24192 replica.cpp:664] Replica learned APPEND action at position 1
I0417 22:48:04.871562 24188 registrar.cpp:490] Successfully updated the 'registry' in 9.007872ms
I0417 22:48:04.871757 24188 registrar.cpp:376] Successfully recovered registrar
I0417 22:48:04.872179 24178 master.cpp:1234] Recovered 0 slaves from the Registry (94B) ;
allowing 10mins for slaves to re-register
I0417 22:48:04.872267 24180 log.cpp:703] Attempting to truncate the log to 1
I0417 22:48:04.872423 24188 coordinator.cpp:340] Coordinator attempting to write TRUNCATE
action at position 2
I0417 22:48:04.873039 24185 replica.cpp:511] Replica received write request for position 2
I0417 22:48:04.873473 24185 leveldb.cpp:343] Persisting action (16 bytes) to leveldb took
416055ns
I0417 22:48:04.873505 24185 replica.cpp:679] Persisted action at 2
I0417 22:48:04.873989 24190 replica.cpp:658] Replica received learned notice for position
2
I0417 22:48:04.874287 24190 leveldb.cpp:343] Persisting action (18 bytes) to leveldb took
282809ns
I0417 22:48:04.874344 24190 leveldb.cpp:401] Deleting ~1 keys from leveldb took 41834ns
I0417 22:48:04.874366 24190 replica.cpp:679] Persisted action at 2
I0417 22:48:04.874398 24190 replica.cpp:664] Replica learned TRUNCATE action at position 2
    @     0x7f9128e10ffa _ZNSt14_Function_base13_Base_managerIZN7process5deferIN5mesos8internal5slave5SlaveEiiSt12_PlaceholderILi1EES7_ILi2EEEENS1_9_DeferredIDTclsr3stdE4bindadsr3std8functionIFvT0_T1_EEEclcvSt8functionISD_E_Efp1_fp2_EEEERKNS1_3PIDIT_EEMSJ_FvSB_SC_ET2_T3_EUliiE_E10_M_destroyERSt9_Any_dataSt17integral_constantIbLb0EE
I0417 22:48:04.876715 24187 sched.cpp:521] Sending registration request to master@172.17.0.62:55041
I0417 22:48:04.876821 24187 sched.cpp:554] Will retry registration in 846.417151ms if necessary
I0417 22:48:04.876999 24182 master.cpp:1640] Received registration request for framework 'Persistent
Volume Framework (C++)' at scheduler-7b7e1b5f-6c27-450c-b117-ab313a3cf8bd@172.17.0.62:55041
I0417 22:48:04.877054 24182 master.cpp:1501] Authorizing framework principal 'test' to receive
offers for role 'test'
I0417 22:48:04.878262 24184 master.cpp:1707] Registering framework 20150417-224804-1040191916-55041-24154-0000
(Persistent Volume Framework (C++)) at scheduler-7b7e1b5f-6c27-450c-b117-ab313a3cf8bd@172.17.0.62:55041
I0417 22:48:04.879447 24184 sched.cpp:448] Framework registered with 20150417-224804-1040191916-55041-24154-0000
I0417 22:48:04.879449 24188 hierarchical.hpp:321] Added framework 20150417-224804-1040191916-55041-24154-0000
I0417 22:48:04.879508 24184 persistent_volume_framework.cpp:129] Registered with master id:
"20150417-224804-1040191916-55041-24154"
ip: 1040191916
port: 55041
pid: "master@172.17.0.62:55041"
hostname: "7630717d513f"
 and got framework ID 20150417-224804-1040191916-55041-24154-0000
I0417 22:48:04.879643 24184 sched.cpp:462] Scheduler::registered took 148804ns
I0417 22:48:04.879751 24188 hierarchical.hpp:818] No resources available to allocate!
I0417 22:48:04.879829 24188 hierarchical.hpp:725] Performed allocation for 0 slaves in 321247ns
    @     0x7f9128e10ec6 _ZNSt14_Function_base13_Base_managerIZN7process5deferIN5mesos8internal5slave5SlaveEiiSt12_PlaceholderILi1EES7_ILi2EEEENS1_9_DeferredIDTclsr3stdE4bindadsr3std8functionIFvT0_T1_EEEclcvSt8functionISD_E_Efp1_fp2_EEEERKNS1_3PIDIT_EEMSJ_FvSB_SC_ET2_T3_EUliiE_E10_M_managerERSt9_Any_dataRKST_St18_Manager_operation
    @           0x465447 std::_Function_base::~_Function_base()
    @     0x7f9128dab1b5 std::function<>::~function()
    @     0x7f9128e0fec5 std::_Head_base<>::~_Head_base()
    @     0x7f9128e0fea5 _ZNSt11_Tuple_implILm0EJSt8functionIFviiEESt12_PlaceholderILi1EES3_ILi2EEEED2Ev
    @     0x7f9128e0fe85 _ZNSt5tupleIJSt8functionIFviiEESt12_PlaceholderILi1EES3_ILi2EEEED2Ev
    @     0x7f9128e0ecac std::_Bind<>::~_Bind()
    @     0x7f9128e10151 std::_Function_base::_Base_manager<>::_M_destroy()
    @     0x7f9128e10026 std::_Function_base::_Base_manager<>::_M_manager()
    @           0x465447 std::_Function_base::~_Function_base()
    @     0x7f9128dab1b5 std::function<>::~function()
    @     0x7f9128dae98e std::function<>::operator=()
    @     0x7f9128d77499 mesos::internal::slave::Slave::initialize()
    @     0x7f9128d864b2 mesos::internal::slave::Slave::initialize()
    @     0x7f912955de1f process::ProcessManager::resume()
    @     0x7f912955d9ee process::schedule()
    @     0x7f912615c0a5 start_thread
    @     0x7f9125e89cfd (unknown)
../../src/tests/script.cpp:83: Failure
Failed
persistent_volume_framework_test.sh terminated with signal Aborted
[  FAILED  ] ExamplesTest.PersistentVolumeFramework (1007 ms)

{code}

> ExamplesTest.PersistentVolumeFramework is flaky
> -----------------------------------------------
>
>                 Key: MESOS-2627
>                 URL: https://issues.apache.org/jira/browse/MESOS-2627
>             Project: Mesos
>          Issue Type: Bug
>         Environment: OS X Yosemite
>            Reporter: Cody Maloney
>              Labels: flaky, flaky-test
>
> This just failed for the first time on our OS X Bot (Far less frequent flaky than the
other ExamplesTest, but still flaky) while compiling master at commit f6620f851f635b3346c6ebf878152f38b3932ad9.
There weren't any commits which touched / changed anything in the test in the set.
> {code}
> [ RUN      ] ExamplesTest.PersistentVolumeFramework ../../src/tests/script.cpp:83: Failure
Failed persistent_volume_framework_test.sh terminated with signal Abort trap: 6 
> [  FAILED  ] ExamplesTest.PersistentVolumeFramework (7865 ms)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message