asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ildar Absalyamov <ildar.absalya...@gmail.com>
Subject Re: Tasks remaining for release
Date Wed, 08 Jul 2015 06:47:25 GMT
Shouldn’t those threads get reused in a fixed size pool?

> On Jul 7, 2015, at 23:40, Mike Carey <dtabass@gmail.com> wrote:
> 
> Wierd....  That many threads seems wrong.....
> 
> On 7/7/15 8:35 PM, Ian Maxon wrote:
>> I think I have at least a workaround to the thread starvation nailed
>> down. We'll have to see, but basically I think the latest few patches
>> cause us to use more threads for whatever reason- and this pushed us
>> over the default thread cap in many circumstances (not always). Going
>> ahead and setting the number of processes to be unlimited within the
>> build server and containers seems to have put out the fire, so to
>> speak. Another confounding factor is the issue that docker containers
>> run within the same host and hence also have their own shared thread
>> limit, in addition to the host's thread limit. It's not clear to me
>> however whether we intend to use that many threads (~500), or if
>> there's a subtle resource leak somewhere.
>> 
>> - Ian
>> 
>> On Tue, Jul 7, 2015 at 5:44 PM, Eldon Carman <ecarm002@ucr.edu> wrote:
>>> In my branch ("ecarm002/introspection_alternate"), I have adapted some code
>>> I received from Ildar to repeatedly test a set of runtime tests. I am not
>>> sure this testing process will be related to your issue or not. I found
>>> this class very helpful in finding the error that was causing my problem
>>> for introspection. You could add the feeds test to the
>>> repeatedtestsuite.xml and try running it. The process might help you cause
>>> the error locally.
>>> 
>>> https://github.com/ecarm002/incubator-asterixdb/tree/ecarm002/introspection_alternate
>>> 
>>> edu.uci.ics.asterix.test.runtime.RepteatedTest
>>> 
>>> 
>>> 
>>> 
>>> On Mon, Jul 6, 2015 at 8:25 PM, Ian Maxon <imaxon@uci.edu> wrote:
>>> 
>>>> Raman and I worked on getting to the root of what is causing the build
>>>> instability for a while today. The investigation is still ongoing but
>>>> so far we've discovered the following things:
>>>> 
>>>> - The OOM error specifically is running out of threads to create on
>>>> the machine, which is odd. We aren't creating more than 500 threads
>>>> per JVM during testing so this is especially puzzling. The heap size
>>>> or permgen size are not the issue.
>>>> 
>>>> - The OOM error can be observed at the point where only feeds was
>>>> merged (and not YARN or the managix scripting fix)
>>>> 
>>>> - Neither of us can reproduce this locally on our development
>>>> machines. It seems that the environment is a variable in this issue
>>>> (hitting the thread limit on the machine), somehow.
>>>> 
>>>> - Where or if the tests run out of threads is not deterministic. It
>>>> tends to fail around the feeds portion of the execution tests, but
>>>> this is only a loose pattern. They can all pass, or the OOM can be hit
>>>> during integration tests, or other totally unrelated execution tests.
>>>> 
>>>> - There are a few feeds tests which sometimes fail (namely issue_711
>>>> and feeds_10) but this is totally unrelated to the more major issues
>>>> of running out of threads on the build machine.
>>>> 
>>>> Given all the above, it looks like there is at least a degree of
>>>> configuration/environmental influence on this issue.
>>>> 
>>>> - Ian
>>>> 
>>>> 
>>>> 
>>>> On Mon, Jul 6, 2015 at 2:14 PM, Raman Grover <ramangrover29@gmail.com>
>>>> wrote:
>>>>> Hi
>>>>> 
>>>>> a) The two big commits to the master (YARN integration and feeds)
>>>> happened
>>>>> as atomic units that makes it easier to
>>>>> reset the master to the version prior to each feature and verify if the
>>>>> build began showing OOM after each of the suspected commits. We have
a
>>>>> pretty deterministic way of nailing down the commit that introduced the
>>>>> problem. I would suggest, instead of disabling the feeds tests, can we
>>>>> revert to the earlier commit and confirm if the feeds commit did
>>>> introduce
>>>>> the behavior and repeat the test with the YARN commit that followed.
We
>>>>> should be able to see sudden increase/drop in build stability by running
>>>>> sufficient number of iterations.
>>>>> 
>>>>> b) I have not been able to reproduce the OOM at my setup where I have
>>>> been
>>>>> running the build repeatedly.
>>>>> @Ian are you able to reproduce it at your system? May be I am not running
>>>>> the build sufficient number of times?
>>>>> I am still not able to understand how removal of test cases still causes
>>>>> the OOM? I can go back and look at the precise changes made during the
>>>>> feeds commit that could introduce OOM even if feeds are not involved
at
>>>>> all, but as I see it, the changes made do not play a role if feeds are
>>>> not
>>>>> being ingested.
>>>>> 
>>>>> 
>>>>> Regards,
>>>>> Raman
>>>>> 
>>>>> 
>>>>> On Thu, Jul 2, 2015 at 6:42 PM, Ian Maxon <imaxon@uci.edu> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> We are close to having a release ready, but there's a few things
left
>>>>>> on the checklist before we can cut the first Apache release. I think
>>>>>> most things on this list are underway, but I'll put them here just
for
>>>>>> reference/visibility. Comments and thoughts are welcomed.
>>>>>> 
>>>>>> - Build stability after merging YARN and Feeds seems to have seriously
>>>>>> declined. It's hard to get a build to go through to the end without
>>>>>> going OOM at all now honestly, so this is a Problem. I think it may
be
>>>>>> related to Feeds, but even after disabling the tests
>>>>>> (https://asterix-gerrit.ics.uci.edu/#/c/312/), I still see it.
>>>>>> Therefore I am not precisely sure what is going on, but it only
>>>>>> started to happen after we merged those two features. It's not exactly
>>>>>> obvious to me where the memory leak is coming from. @Raman, it would
>>>>>> be great to get your advice/thoughts on this.
>>>>>> 
>>>>>> - Metadata name changes and Metadata caching consistency fixes are
>>>>>> underway by Ildar.
>>>>>> 
>>>>>> - The repackaging and license checker patches still need to be merged
>>>>>> in, but this should happen after the above two features are merged.
>>>>>> They are otherwise ready for review though.
>>>>>> 
>>>>>> - Now that Feeds is merged, the Apache website should be changed
to
>>>>>> the new version that has been in draft form for a few weeks now.
>>>>>> Before it may have been a little premature, but now it should be
>>>>>> accurate. The documentation site should also be reverted to its prior
>>>>>> state, before it was quickly patched to serve as an interim website.
>>>>>> 
>>>>>> 
>>>>>> If there's anything else I am missing that should be in this list,
>>>>>> please feel free to add it into this thread.
>>>>>> 
>>>>>> Thanks,
>>>>>> -Ian
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> Raman
> 

Best regards,
Ildar


Mime
View raw message