harmony-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Salikh Zakirov (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HARMONY-3002) [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386
Date Wed, 07 Feb 2007 11:41:06 GMT

    [ https://issues.apache.org/jira/browse/HARMONY-3002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12470925
] 

Salikh Zakirov commented on HARMONY-3002:
-----------------------------------------

I've been trying to catch the root cause of these race condition by adding more asserts, and
finally got assertion the following state

 Thread D:

	hythread_thin_monitor_exit() { 

		lockword = *lockword_ptr;	// 000D0800 = owned by D, reserved, recursion=1
		...
		assert(*lockword_ptr == lockword); // this was added by me and failed,
									// *lockword_ptr = 000D0400
									// owned by D, unreserved, recursion=1
								
		RECURSION_DEC(lockword_ptr, lockword);

By the time hythread_thin_monitor_exit() wanted to decrease recursion count by rewriting lockword,
the lockword value in memory changed from 000D0800 to 000D0400, thus violating the assumption
that the only thread modifying the lockword is the thread that installed its thread id into
lockword.

The change 000D08000 -> 000D0400 corresponds to the unreservation procedure.
(owned by D, reserved, recursion = 1 -> owned by D, unreserved, recursion = 1).

Unreservation procedure is supposed to suspend the owner of the unreserved thread, using safe
suspension
model, which must guarantee that no unsafe code is running on the target thread during unreservation.
hythread_thin_monitor_exit() is an example of unsafe code, which must not run during unreservation.

Looking at the unreserve_lock(), it does suspend lock owner thread first:

	169         status=hythread_suspend_other(owner);

And obviously the lock owner thread wasn't really suspended in this case, because it was running
hythread_thin_monitor_exit() at the same time.

Looking at hythread_suspend_other(), it can return immediately without waiting for the thread
to be really suspended, if the suspension was already requested for that thread:

hythread_suspend_other():
288         send_suspend_request(thread);
289         while(wait_safe_region_event(thread)!=TM_ERROR_NONE) {
...
311     return TM_ERROR_NONE;
312 }

and 

wait_safe_region_event():
217 static IDATA wait_safe_region_event(hythread_t thread) {
...
219     if(thread->suspend_request > 1 || thread == tm_self_tls) {
...
221         return TM_ERROR_NONE;
222     }

The problem looks like incorrect assumption in hythread_suspend_other():
 "if suspend_request requested more than once, we do not need to wait, because thread is already
suspended",
because in reality it is not guaranteed, that the thread with suspend->request == 1 has
already been suspended.

In this particular test, GC happens fairly often, and probably first suspension request was
posted by thread trying to start garbage collection.


> [drlvm] Race condition in threading code leads to VM crash on SMP systems on HARMONY-2386
> -----------------------------------------------------------------------------------------
>
>                 Key: HARMONY-3002
>                 URL: https://issues.apache.org/jira/browse/HARMONY-3002
>             Project: Harmony
>          Issue Type: Bug
>          Components: DRLVM
>         Environment: Linux ia32, windows ia32
>            Reporter: Gregory Shimansky
>
> I am not sure this is a class loader bug, but the crash happens on class loader code
in class_initialize. When running the test in HARMONY-2386 many times in a loop it crashes
after some time. The fat monitor which is used for synchronization in Class::initialize appears
to be uninitialized or corrupted. It could be a thread manager bug or enumeration problem
as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message