Return-Path: X-Original-To: apmail-asterixdb-dev-archive@minotaur.apache.org Delivered-To: apmail-asterixdb-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6B8D6181B1 for ; Wed, 8 Jul 2015 06:40:45 +0000 (UTC) Received: (qmail 49346 invoked by uid 500); 8 Jul 2015 06:40:45 -0000 Delivered-To: apmail-asterixdb-dev-archive@asterixdb.apache.org Received: (qmail 49296 invoked by uid 500); 8 Jul 2015 06:40:45 -0000 Mailing-List: contact dev-help@asterixdb.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.incubator.apache.org Delivered-To: mailing list dev@asterixdb.incubator.apache.org Received: (qmail 49279 invoked by uid 99); 8 Jul 2015 06:40:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jul 2015 06:40:45 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dtabass@gmail.com designates 209.85.214.175 as permitted sender) Received: from [209.85.214.175] (HELO mail-ob0-f175.google.com) (209.85.214.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 08 Jul 2015 06:38:31 +0000 Received: by obdbs4 with SMTP id bs4so144451720obd.3 for ; Tue, 07 Jul 2015 23:40:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type; bh=kijbebnwt+cAWF/GscnMlddloE22zD/OKCG8WnztcCc=; b=MptdWox+naCNCjsVgufdn7qMUd9T+HV+/lUZ7peD74fK3VsZ59io9BDnMy4k+66r+q /4Q1H88IxMq7RDXWmkK/QOpHSIyiNmP6dl2Y4lQoe7TUYsipAqV1UTv84xBGoU7kfqmV kjmZ95iIm3Wg3+JfEIyQZSDNyP/h/nzkh3S/5IjRUTqmXpantKOkl5VCQ0jIF+sRyGzf EKu71ILu1+0epL2FGSFQNUgga7Lju+UBCe6e/VRQnTOXsqIGfGohygwwsSI3+WEcHj7K y/8D+qo8+NrIWdw6aIRxmBH/Aq160F+rgBxgO4g5l8iy0mCFfHKKq+3g5m5MSTFEJRYx 59fw== X-Received: by 10.202.225.65 with SMTP id y62mr7655594oig.78.1436337618444; Tue, 07 Jul 2015 23:40:18 -0700 (PDT) Received: from mikejcarey.local (ip72-219-187-63.oc.oc.cox.net. [72.219.187.63]) by smtp.googlemail.com with ESMTPSA id z19sm785798obp.17.2015.07.07.23.40.17 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 07 Jul 2015 23:40:18 -0700 (PDT) Message-ID: <559CC5D0.2030602@gmail.com> Date: Tue, 07 Jul 2015 23:40:16 -0700 From: Mike Carey User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:31.0) Gecko/20100101 Thunderbird/31.7.0 MIME-Version: 1.0 To: dev@asterixdb.incubator.apache.org Subject: Re: Tasks remaining for release References: In-Reply-To: Content-Type: multipart/alternative; boundary="------------000501000907030607010909" X-Virus-Checked: Checked by ClamAV on apache.org --------------000501000907030607010909 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Wierd.... That many threads seems wrong..... On 7/7/15 8:35 PM, Ian Maxon wrote: > I think I have at least a workaround to the thread starvation nailed > down. We'll have to see, but basically I think the latest few patches > cause us to use more threads for whatever reason- and this pushed us > over the default thread cap in many circumstances (not always). Going > ahead and setting the number of processes to be unlimited within the > build server and containers seems to have put out the fire, so to > speak. Another confounding factor is the issue that docker containers > run within the same host and hence also have their own shared thread > limit, in addition to the host's thread limit. It's not clear to me > however whether we intend to use that many threads (~500), or if > there's a subtle resource leak somewhere. > > - Ian > > On Tue, Jul 7, 2015 at 5:44 PM, Eldon Carman wrote: >> In my branch ("ecarm002/introspection_alternate"), I have adapted some code >> I received from Ildar to repeatedly test a set of runtime tests. I am not >> sure this testing process will be related to your issue or not. I found >> this class very helpful in finding the error that was causing my problem >> for introspection. You could add the feeds test to the >> repeatedtestsuite.xml and try running it. The process might help you cause >> the error locally. >> >> https://github.com/ecarm002/incubator-asterixdb/tree/ecarm002/introspection_alternate >> >> edu.uci.ics.asterix.test.runtime.RepteatedTest >> >> >> >> >> On Mon, Jul 6, 2015 at 8:25 PM, Ian Maxon wrote: >> >>> Raman and I worked on getting to the root of what is causing the build >>> instability for a while today. The investigation is still ongoing but >>> so far we've discovered the following things: >>> >>> - The OOM error specifically is running out of threads to create on >>> the machine, which is odd. We aren't creating more than 500 threads >>> per JVM during testing so this is especially puzzling. The heap size >>> or permgen size are not the issue. >>> >>> - The OOM error can be observed at the point where only feeds was >>> merged (and not YARN or the managix scripting fix) >>> >>> - Neither of us can reproduce this locally on our development >>> machines. It seems that the environment is a variable in this issue >>> (hitting the thread limit on the machine), somehow. >>> >>> - Where or if the tests run out of threads is not deterministic. It >>> tends to fail around the feeds portion of the execution tests, but >>> this is only a loose pattern. They can all pass, or the OOM can be hit >>> during integration tests, or other totally unrelated execution tests. >>> >>> - There are a few feeds tests which sometimes fail (namely issue_711 >>> and feeds_10) but this is totally unrelated to the more major issues >>> of running out of threads on the build machine. >>> >>> Given all the above, it looks like there is at least a degree of >>> configuration/environmental influence on this issue. >>> >>> - Ian >>> >>> >>> >>> On Mon, Jul 6, 2015 at 2:14 PM, Raman Grover >>> wrote: >>>> Hi >>>> >>>> a) The two big commits to the master (YARN integration and feeds) >>> happened >>>> as atomic units that makes it easier to >>>> reset the master to the version prior to each feature and verify if the >>>> build began showing OOM after each of the suspected commits. We have a >>>> pretty deterministic way of nailing down the commit that introduced the >>>> problem. I would suggest, instead of disabling the feeds tests, can we >>>> revert to the earlier commit and confirm if the feeds commit did >>> introduce >>>> the behavior and repeat the test with the YARN commit that followed. We >>>> should be able to see sudden increase/drop in build stability by running >>>> sufficient number of iterations. >>>> >>>> b) I have not been able to reproduce the OOM at my setup where I have >>> been >>>> running the build repeatedly. >>>> @Ian are you able to reproduce it at your system? May be I am not running >>>> the build sufficient number of times? >>>> I am still not able to understand how removal of test cases still causes >>>> the OOM? I can go back and look at the precise changes made during the >>>> feeds commit that could introduce OOM even if feeds are not involved at >>>> all, but as I see it, the changes made do not play a role if feeds are >>> not >>>> being ingested. >>>> >>>> >>>> Regards, >>>> Raman >>>> >>>> >>>> On Thu, Jul 2, 2015 at 6:42 PM, Ian Maxon wrote: >>>> >>>>> Hi all, >>>>> >>>>> We are close to having a release ready, but there's a few things left >>>>> on the checklist before we can cut the first Apache release. I think >>>>> most things on this list are underway, but I'll put them here just for >>>>> reference/visibility. Comments and thoughts are welcomed. >>>>> >>>>> - Build stability after merging YARN and Feeds seems to have seriously >>>>> declined. It's hard to get a build to go through to the end without >>>>> going OOM at all now honestly, so this is a Problem. I think it may be >>>>> related to Feeds, but even after disabling the tests >>>>> (https://asterix-gerrit.ics.uci.edu/#/c/312/), I still see it. >>>>> Therefore I am not precisely sure what is going on, but it only >>>>> started to happen after we merged those two features. It's not exactly >>>>> obvious to me where the memory leak is coming from. @Raman, it would >>>>> be great to get your advice/thoughts on this. >>>>> >>>>> - Metadata name changes and Metadata caching consistency fixes are >>>>> underway by Ildar. >>>>> >>>>> - The repackaging and license checker patches still need to be merged >>>>> in, but this should happen after the above two features are merged. >>>>> They are otherwise ready for review though. >>>>> >>>>> - Now that Feeds is merged, the Apache website should be changed to >>>>> the new version that has been in draft form for a few weeks now. >>>>> Before it may have been a little premature, but now it should be >>>>> accurate. The documentation site should also be reverted to its prior >>>>> state, before it was quickly patched to serve as an interim website. >>>>> >>>>> >>>>> If there's anything else I am missing that should be in this list, >>>>> please feel free to add it into this thread. >>>>> >>>>> Thanks, >>>>> -Ian >>>>> >>>> >>>> >>>> -- >>>> Raman --------------000501000907030607010909--