harmony-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Evgueni Brevnov (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HARMONY-4215) [drlvm][reliability] assert in JIT while unwinding stack (during enumeration)
Date Mon, 03 Sep 2007 07:53:19 GMT

     [ https://issues.apache.org/jira/browse/HARMONY-4215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Evgueni Brevnov updated HARMONY-4215:

    Attachment: H4215_tm_races.patch

I want to share with you my findings during HARMONY-4215 investigation. In total I found 8
issues in TM component. By fixing/workarounding these issues I managed to pass HARMONY-4215.
I'm not sure if all of them are unknown. 

1)       Problem state: New threads may appear and switch to suspend disabled state (so performing
dangerous operations) during GC cycle. That may happen because hythread_suspend_all() releases
the global lock which guards attachment of new threads. 

Possible solution: Keep global lock until hythread_resume_all() is called.


2)       Problem state: Race condition in vm_attach(). Last action of jthread_allocate_vm_thread
is to associate native with vm_thread. From that moment this thread becomes available for
GC enumeration. But by that time it's not yet fully initialized. For example M2N frame is
not allocated and initialized. Possible solution: Native and VM thread association should
be last action with atomic effect.


3)       Problem state: Race condition in hythread_suspend_disable(). This one was mentioned
by Weldon on the list. But only one side of the problem was discussed. To guarantee correct
work of suspension machinery two conditions should be met. The first one is that suspender
thread MUST observe exactly the same value of disable_count as it is assigned before suspendee
thread reads suspend_request flag. The second one is that suspendee thread MUST increment
disable_count before it reads suspend_request flag.

Possible solution: Add memory read/write barrier between write operation to disable_count
and read operation from suspend_request.


4)       Problem state: Potentially illegal transition from suspend disabled to suspend enabled
mode in hythread_global_lock. When some thread is trying to acquire TM's global lock in disabled
state it will be quietly switched in enabled state if GC wants to happen. But such transition
is illegal if thread already performed "unsafe" operations and is not ready for GC to happen.
This issue affects overall stability in a negative way and can be hard to identify.

Possible solution: I think the best choice here to follow the rule that TM's global lock should
be grabbed before switching to disabled state.


5)       Problem state: Race condition in hythread_global_lock. At the very end of this function
disable_count is assigned to its original value. Another thread may send suspend request to
the current and check disable_count (which is zero) just after current thread have read suspend_request
(read value is zero) flag and changed disable_count. Thus suspender thread thinks that it
successfully suspended target thread and target thread in the same time enters unsafe region.

Possible solution: Use hythread_set_suspend_disable to change the value of disable_count which
takes care about described situation. Unfortunately, this solution is not perfect because
it opens small window when deadlock can happened. This is exactly the same type of deadlock
which current procedure is trying to avoid. So the best solution will be the same as described
in item 4.


6)       Problem state: Race condition in thread_safe_point_impl(). The problem is exactly
the same as described in item 5. The very last assignment to disable_count can be noted by
suspender thread to late and the thread will enter unsafe region while suspender thinks it's
ok to procced.

Possible solution: Use hythread_set_suspend_disable to change the value of disable_count.


7)       Problem state: Race condition in hythread_set_suspend_disable. See item 2 for description
and solution.

8)       Problem state: Minor concern regarding thread attach/detach procedure. It seems to
be reasonable from stability POV to hold TM's global lock during the whole attach/detach period.
That can help to avoid nasty race conditions like mentioned in item 2. So in the attached
patch the problem 2. is fixed/workarounded this way.


I didn't test attached patch thoroughly. I've only checked that it fixes HARMONY-4215. I also
didn't check performance impact. So don't take it as final rather just as a demonstration
of problems.

> [drlvm][reliability] assert in JIT while unwinding stack (during enumeration)
> -----------------------------------------------------------------------------
>                 Key: HARMONY-4215
>                 URL: https://issues.apache.org/jira/browse/HARMONY-4215
>             Project: Harmony
>          Issue Type: Bug
>          Components: DRLVM
>         Environment: Win 2003
>            Reporter: Aleksey Ignatenko
>         Attachments: 2rel_tests_fail.dmp, H4215_tm_races.patch, test.zip, util.Date_int.out.err
> 2 reliablity tests failed on assertion in jit:
> 1. CharsetSyncCacheTest
> 2. ZlibTest
> svn rev 548320
> Call stack:
>  	jitrino.dll!_assert(const char * expr=0x01ba9f08, const char * filename=0x01ba9eb0,
unsigned int lineno=0x000000ac)  Line 295	C
> >	jitrino.dll!Jitrino::Ia32::StackInfo::read(Jitrino::MethodDesc * pMethodDesc=0x03bef720,
unsigned int eip=0x00000000, bool isFirst=false)  Line 172 + 0x1a	C++
>  	jitrino.dll!Jitrino::Ia32::RuntimeInterface::unwindStack(Jitrino::MethodDesc * methodDesc=0x03bef720,
JitFrameContext * context=0x02de90d4, bool isFirst=false)  Line 40	C++
>  	jitrino.dll!Jitrino::Jitrino::UnwindStack(Jitrino::MethodDesc * methodDesc=0x03bef720,
JitFrameContext * context=0x02de90d4, bool isFirst=false)  Line 280 + 0x1e	C++
>  	jitrino.dll!JIT_unwind_stack_frame(void * jit=0x012a6a48, Method * method=0x02cb3668,
JitFrameContext * context=0x02de90d4)  Line 362 + 0x18	C++
>  	harmonyvm.dll!Dll_JIT::unwind_stack_frame(Method * method=0x02cb3668, JitFrameContext
* context=0x02de90d4)  Line 94 + 0x14	C++
>  	harmonyvm.dll!si_goto_previous(StackIterator * si=0x02de90d0, bool over_popped=false)
 Line 315 + 0x32	C++
>  	harmonyvm.dll!vm_enumerate_root_set_single_thread_on_stack(StackIterator * si=0x02de90d0)
 Line 339 + 0xb	C++
>  	harmonyvm.dll!vm_enumerate_thread(VM_thread * thread=0x02d7e088)  Line 224 + 0x9	C++
>  	harmonyvm.dll!stop_the_world_root_set_enumeration()  Line 110 + 0xc	C++
>  	harmonyvm.dll!vm_enumerate_root_set_all_threads()  Line 150	C++
>  	gc_gen.dll!gc_reclaim_heap(GC * gc=0x014d5de8, unsigned int gc_cause=0x00000003)  Line
295 + 0x8	C++
>  	gc_gen.dll!gc_force_gc()  Line 138 + 0xd	C++
>  	harmonyvm.dll!Java_java_lang_VMMemoryManager_runGC(JNIEnv_External * __formal=0x02d9e240,
JNIEnv_External * __formal=0x02d9e240)  Line 138 + 0x8	C++
>  	03bf33ef()	
>  	harmonyvm.dll!get_vm_thread(HyThread * thr=0x03bef9ec)  Line 193 + 0xb	C++
>  	harmonyvm.dll!get_vm_thread(HyThread * thr=0x00000001)  Line 194 + 0x7	C++
> To reproduce onee needs to run these tests with 30 min cycle (set     <property name="test.duration"
value="1800" />  parameter in build.xml of reliablity tests).

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message