asterixdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Carey <dtab...@gmail.com>
Subject Re: Tasks remaining for release
Date Wed, 08 Jul 2015 06:40:16 GMT
Wierd....  That many threads seems wrong.....

On 7/7/15 8:35 PM, Ian Maxon wrote:
> I think I have at least a workaround to the thread starvation nailed
> down. We'll have to see, but basically I think the latest few patches
> cause us to use more threads for whatever reason- and this pushed us
> over the default thread cap in many circumstances (not always). Going
> ahead and setting the number of processes to be unlimited within the
> build server and containers seems to have put out the fire, so to
> speak. Another confounding factor is the issue that docker containers
> run within the same host and hence also have their own shared thread
> limit, in addition to the host's thread limit. It's not clear to me
> however whether we intend to use that many threads (~500), or if
> there's a subtle resource leak somewhere.
>
> - Ian
>
> On Tue, Jul 7, 2015 at 5:44 PM, Eldon Carman <ecarm002@ucr.edu> wrote:
>> In my branch ("ecarm002/introspection_alternate"), I have adapted some code
>> I received from Ildar to repeatedly test a set of runtime tests. I am not
>> sure this testing process will be related to your issue or not. I found
>> this class very helpful in finding the error that was causing my problem
>> for introspection. You could add the feeds test to the
>> repeatedtestsuite.xml and try running it. The process might help you cause
>> the error locally.
>>
>> https://github.com/ecarm002/incubator-asterixdb/tree/ecarm002/introspection_alternate
>>
>> edu.uci.ics.asterix.test.runtime.RepteatedTest
>>
>>
>>
>>
>> On Mon, Jul 6, 2015 at 8:25 PM, Ian Maxon <imaxon@uci.edu> wrote:
>>
>>> Raman and I worked on getting to the root of what is causing the build
>>> instability for a while today. The investigation is still ongoing but
>>> so far we've discovered the following things:
>>>
>>> - The OOM error specifically is running out of threads to create on
>>> the machine, which is odd. We aren't creating more than 500 threads
>>> per JVM during testing so this is especially puzzling. The heap size
>>> or permgen size are not the issue.
>>>
>>> - The OOM error can be observed at the point where only feeds was
>>> merged (and not YARN or the managix scripting fix)
>>>
>>> - Neither of us can reproduce this locally on our development
>>> machines. It seems that the environment is a variable in this issue
>>> (hitting the thread limit on the machine), somehow.
>>>
>>> - Where or if the tests run out of threads is not deterministic. It
>>> tends to fail around the feeds portion of the execution tests, but
>>> this is only a loose pattern. They can all pass, or the OOM can be hit
>>> during integration tests, or other totally unrelated execution tests.
>>>
>>> - There are a few feeds tests which sometimes fail (namely issue_711
>>> and feeds_10) but this is totally unrelated to the more major issues
>>> of running out of threads on the build machine.
>>>
>>> Given all the above, it looks like there is at least a degree of
>>> configuration/environmental influence on this issue.
>>>
>>> - Ian
>>>
>>>
>>>
>>> On Mon, Jul 6, 2015 at 2:14 PM, Raman Grover <ramangrover29@gmail.com>
>>> wrote:
>>>> Hi
>>>>
>>>> a) The two big commits to the master (YARN integration and feeds)
>>> happened
>>>> as atomic units that makes it easier to
>>>> reset the master to the version prior to each feature and verify if the
>>>> build began showing OOM after each of the suspected commits. We have a
>>>> pretty deterministic way of nailing down the commit that introduced the
>>>> problem. I would suggest, instead of disabling the feeds tests, can we
>>>> revert to the earlier commit and confirm if the feeds commit did
>>> introduce
>>>> the behavior and repeat the test with the YARN commit that followed. We
>>>> should be able to see sudden increase/drop in build stability by running
>>>> sufficient number of iterations.
>>>>
>>>> b) I have not been able to reproduce the OOM at my setup where I have
>>> been
>>>> running the build repeatedly.
>>>> @Ian are you able to reproduce it at your system? May be I am not running
>>>> the build sufficient number of times?
>>>> I am still not able to understand how removal of test cases still causes
>>>> the OOM? I can go back and look at the precise changes made during the
>>>> feeds commit that could introduce OOM even if feeds are not involved at
>>>> all, but as I see it, the changes made do not play a role if feeds are
>>> not
>>>> being ingested.
>>>>
>>>>
>>>> Regards,
>>>> Raman
>>>>
>>>>
>>>> On Thu, Jul 2, 2015 at 6:42 PM, Ian Maxon <imaxon@uci.edu> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> We are close to having a release ready, but there's a few things left
>>>>> on the checklist before we can cut the first Apache release. I think
>>>>> most things on this list are underway, but I'll put them here just for
>>>>> reference/visibility. Comments and thoughts are welcomed.
>>>>>
>>>>> - Build stability after merging YARN and Feeds seems to have seriously
>>>>> declined. It's hard to get a build to go through to the end without
>>>>> going OOM at all now honestly, so this is a Problem. I think it may be
>>>>> related to Feeds, but even after disabling the tests
>>>>> (https://asterix-gerrit.ics.uci.edu/#/c/312/), I still see it.
>>>>> Therefore I am not precisely sure what is going on, but it only
>>>>> started to happen after we merged those two features. It's not exactly
>>>>> obvious to me where the memory leak is coming from. @Raman, it would
>>>>> be great to get your advice/thoughts on this.
>>>>>
>>>>> - Metadata name changes and Metadata caching consistency fixes are
>>>>> underway by Ildar.
>>>>>
>>>>> - The repackaging and license checker patches still need to be merged
>>>>> in, but this should happen after the above two features are merged.
>>>>> They are otherwise ready for review though.
>>>>>
>>>>> - Now that Feeds is merged, the Apache website should be changed to
>>>>> the new version that has been in draft form for a few weeks now.
>>>>> Before it may have been a little premature, but now it should be
>>>>> accurate. The documentation site should also be reverted to its prior
>>>>> state, before it was quickly patched to serve as an interim website.
>>>>>
>>>>>
>>>>> If there's anything else I am missing that should be in this list,
>>>>> please feel free to add it into this thread.
>>>>>
>>>>> Thanks,
>>>>> -Ian
>>>>>
>>>>
>>>>
>>>> --
>>>> Raman


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message