asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ildar Absalyamov <>
Subject Re: Tasks remaining for release
Date Wed, 08 Jul 2015 06:47:25 GMT
Shouldn’t those threads get reused in a fixed size pool?

> On Jul 7, 2015, at 23:40, Mike Carey <> wrote:
> Wierd....  That many threads seems wrong.....
> On 7/7/15 8:35 PM, Ian Maxon wrote:
>> I think I have at least a workaround to the thread starvation nailed
>> down. We'll have to see, but basically I think the latest few patches
>> cause us to use more threads for whatever reason- and this pushed us
>> over the default thread cap in many circumstances (not always). Going
>> ahead and setting the number of processes to be unlimited within the
>> build server and containers seems to have put out the fire, so to
>> speak. Another confounding factor is the issue that docker containers
>> run within the same host and hence also have their own shared thread
>> limit, in addition to the host's thread limit. It's not clear to me
>> however whether we intend to use that many threads (~500), or if
>> there's a subtle resource leak somewhere.
>> - Ian
>> On Tue, Jul 7, 2015 at 5:44 PM, Eldon Carman <> wrote:
>>> In my branch ("ecarm002/introspection_alternate"), I have adapted some code
>>> I received from Ildar to repeatedly test a set of runtime tests. I am not
>>> sure this testing process will be related to your issue or not. I found
>>> this class very helpful in finding the error that was causing my problem
>>> for introspection. You could add the feeds test to the
>>> repeatedtestsuite.xml and try running it. The process might help you cause
>>> the error locally.
>>> edu.uci.ics.asterix.test.runtime.RepteatedTest
>>> On Mon, Jul 6, 2015 at 8:25 PM, Ian Maxon <> wrote:
>>>> Raman and I worked on getting to the root of what is causing the build
>>>> instability for a while today. The investigation is still ongoing but
>>>> so far we've discovered the following things:
>>>> - The OOM error specifically is running out of threads to create on
>>>> the machine, which is odd. We aren't creating more than 500 threads
>>>> per JVM during testing so this is especially puzzling. The heap size
>>>> or permgen size are not the issue.
>>>> - The OOM error can be observed at the point where only feeds was
>>>> merged (and not YARN or the managix scripting fix)
>>>> - Neither of us can reproduce this locally on our development
>>>> machines. It seems that the environment is a variable in this issue
>>>> (hitting the thread limit on the machine), somehow.
>>>> - Where or if the tests run out of threads is not deterministic. It
>>>> tends to fail around the feeds portion of the execution tests, but
>>>> this is only a loose pattern. They can all pass, or the OOM can be hit
>>>> during integration tests, or other totally unrelated execution tests.
>>>> - There are a few feeds tests which sometimes fail (namely issue_711
>>>> and feeds_10) but this is totally unrelated to the more major issues
>>>> of running out of threads on the build machine.
>>>> Given all the above, it looks like there is at least a degree of
>>>> configuration/environmental influence on this issue.
>>>> - Ian
>>>> On Mon, Jul 6, 2015 at 2:14 PM, Raman Grover <>
>>>> wrote:
>>>>> Hi
>>>>> a) The two big commits to the master (YARN integration and feeds)
>>>> happened
>>>>> as atomic units that makes it easier to
>>>>> reset the master to the version prior to each feature and verify if the
>>>>> build began showing OOM after each of the suspected commits. We have
>>>>> pretty deterministic way of nailing down the commit that introduced the
>>>>> problem. I would suggest, instead of disabling the feeds tests, can we
>>>>> revert to the earlier commit and confirm if the feeds commit did
>>>> introduce
>>>>> the behavior and repeat the test with the YARN commit that followed.
>>>>> should be able to see sudden increase/drop in build stability by running
>>>>> sufficient number of iterations.
>>>>> b) I have not been able to reproduce the OOM at my setup where I have
>>>> been
>>>>> running the build repeatedly.
>>>>> @Ian are you able to reproduce it at your system? May be I am not running
>>>>> the build sufficient number of times?
>>>>> I am still not able to understand how removal of test cases still causes
>>>>> the OOM? I can go back and look at the precise changes made during the
>>>>> feeds commit that could introduce OOM even if feeds are not involved
>>>>> all, but as I see it, the changes made do not play a role if feeds are
>>>> not
>>>>> being ingested.
>>>>> Regards,
>>>>> Raman
>>>>> On Thu, Jul 2, 2015 at 6:42 PM, Ian Maxon <> wrote:
>>>>>> Hi all,
>>>>>> We are close to having a release ready, but there's a few things
>>>>>> on the checklist before we can cut the first Apache release. I think
>>>>>> most things on this list are underway, but I'll put them here just
>>>>>> reference/visibility. Comments and thoughts are welcomed.
>>>>>> - Build stability after merging YARN and Feeds seems to have seriously
>>>>>> declined. It's hard to get a build to go through to the end without
>>>>>> going OOM at all now honestly, so this is a Problem. I think it may
>>>>>> related to Feeds, but even after disabling the tests
>>>>>> (, I still see it.
>>>>>> Therefore I am not precisely sure what is going on, but it only
>>>>>> started to happen after we merged those two features. It's not exactly
>>>>>> obvious to me where the memory leak is coming from. @Raman, it would
>>>>>> be great to get your advice/thoughts on this.
>>>>>> - Metadata name changes and Metadata caching consistency fixes are
>>>>>> underway by Ildar.
>>>>>> - The repackaging and license checker patches still need to be merged
>>>>>> in, but this should happen after the above two features are merged.
>>>>>> They are otherwise ready for review though.
>>>>>> - Now that Feeds is merged, the Apache website should be changed
>>>>>> the new version that has been in draft form for a few weeks now.
>>>>>> Before it may have been a little premature, but now it should be
>>>>>> accurate. The documentation site should also be reverted to its prior
>>>>>> state, before it was quickly patched to serve as an interim website.
>>>>>> If there's anything else I am missing that should be in this list,
>>>>>> please feel free to add it into this thread.
>>>>>> Thanks,
>>>>>> -Ian
>>>>> --
>>>>> Raman

Best regards,

View raw message