harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geir Magnusson Jr." <g...@pobox.com>
Subject Re: [classlib][luni] signalis interruptus in hysock
Date Thu, 26 Oct 2006 14:19:47 GMT
Happy to look at that other patch, but I'm no one has convinced me that 
handling the EINTR like I've done already the first time is a bad idea...

and I'm convinced that just swallowing it in the lowest level of the 
library is a bad idea.

geir


Ivan Volosyuk wrote:
> On 10/25/06, Geir Magnusson Jr. <geir@pobox.com> wrote:
>>
>>
>> Fedotov, Alexei A wrote:
>> > Guys,
>> >
>> > Could you please help me to understand the following?
>> >
>> > 1. Is HARMONY-1904 actually a duplicate of my HARMONY-1879?
>>
>> scanning quickly, I don't think so.
>>
>> > 2. Ivan, do I remember correctly that you've already fixed that bug 
>> once
>> > when debugging Eclipse long run failures? Where is that patch?
>>
>> this bug arose when the new TM was added, which uses signals much more
>> aggressively.
>>
>> geir
> 
> Well, the bug exists quite a long time and it was reproducible before.
> Older TM also used signals for stopping threads for GC. The patch I
> have created was not integrated before as it was almost the same as
> the current suggested patch. The only difference was that it handled
> timeout correctly (for other unixes).
> -- 
> Ivan
> 
>>
>> >
>> > Thank you in advance.
>> >
>> > With best regards,
>> > Alexei Fedotov,
>> > Intel Java & XML Engineering
>> >
>> >> -----Original Message-----
>> >> From: Weldon Washburn [mailto:weldonwjw@gmail.com]
>> >> Sent: Wednesday, October 25, 2006 5:36 PM
>> >> To: harmony-dev@incubator.apache.org; geir@pobox.com
>> >> Subject: Re: [classlib][luni] signalis interruptus in hysock
>> >>
>> >> On 10/24/06, Geir Magnusson Jr. <geir@pobox.com> wrote:
>> >>>
>> >>>
>> >>> Weldon Washburn wrote:
>> >>>> It seems JIRA is down for maintenance.  If HARMONY-1904 is still
>> > open
>> >>>> perhaps it makes sense to put a counter in the while (...) {
>> > select...}
>> >>>> loop. And after every N loops, print a warning/diagnostic message.
>> >>> For whom and to what end?  Why not just return EINTR (in hysock
>> > speak)?
>> >>>> The
>> >>>> value for N would have to be tuned.  I don't know what the best
>> > number
>> >>>> would
>> >>>> be. Given that 1904 patch is not the final solution, at least a
>> >>> diagnostic
>> >>>> that hints at where the system hangs would be useful.  It might
>> > make
>> >>> sense
>> >>>> to even print a stack trace.   Also, I agree with Ivan below.
>> > Signals
>> >>> bugs
>> >>>> are very hard to debug.  And diagnostics can help us all understand
>> > the
>> >>>> corner cases better.
>> >>> But so far, no one has shown that the system hangs, or can hang,
>> > simply
>> >>> because we return EINTR....
>> >>
>> >> Sorry for not being clear.  I was reacting to the patch in 1904 
>> itself.
>> >> Not
>> >> the bigger issue of fixing the upper layers to comprehend EINTR.  My
>> >> understanding is that this patch does not fix the problem.  This patch
>> > does
>> >> not return EINTR.  If for whatever reason this patch is committed, I
>> >> recommend adding the above diagnostic code so that we don't dig
>> > ourselves
>> >> an
>> >> even deeper hole.
>> >>
>> >> If it is decided 1904 should not be committed, it might make sense to
>> >> close it with  "won't fix".
>> >>
>> >> geir
>> >>>> On 10/20/06, Ivan Volosyuk <ivan.volosyuk@gmail.com> wrote:
>> >>>>> On 10/20/06, Geir Magnusson Jr. <geir@pobox.com> wrote:
>> >>>>>>
>> >>>>>> Ivan Volosyuk wrote:
>> >>>>>>> Well, I think that the solution is what Geir suggests.
One
>> > think
>> >>>>> which
>> >>>>>>> bothers me is following. EINTR can happen in different
places
>> > and
>> >>> the
>> >>>>>>> situations can be quite rare in some circumstances.
It can
>> > lead to
>> >>>>>>> hard to reproduce stability bugs (race conditions).
>> >>>>>> Can you give an example?
>> >>>>> Half a year ago, I was working on the problem. Socket operations
>> > get
>> >>>>> sometimes interrupted. We have found out that it occurs sometime
>> > after
>> >>>>> GC. It was not quite easy as the application was quite big and
>> >>>>> situation - quite rare.
>> >>>>>
>> >>>>> Given the fact, that current implementation of monitor reservation
>> >>>>> code can stop other thread in quite random fashion we should
have
>> > rock
>> >>>>> solid support of EINTR handling everywhere the select(), poll()
>> > calls
>> >>>>> is used.
>> >>>>>
>> >>>>> --
>> >>>>> Ivan
>> >>>>> Intel Enterprise Solutions Software Division
>> >>>>>
>> >>>>>>> We should find a
>> >>>>>>> way how to test the implementation.
>> >>>>>> +1!
>> >>>>>>
>> >>>>>> :)
>> >>>>>>
>> >>>>>> geir
> 

Mime
View raw message