Return-Path: X-Original-To: apmail-asterixdb-dev-archive@minotaur.apache.org Delivered-To: apmail-asterixdb-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0F2AF18DD4 for ; Mon, 24 Aug 2015 12:03:19 +0000 (UTC) Received: (qmail 74082 invoked by uid 500); 24 Aug 2015 12:03:19 -0000 Delivered-To: apmail-asterixdb-dev-archive@asterixdb.apache.org Received: (qmail 74026 invoked by uid 500); 24 Aug 2015 12:03:19 -0000 Mailing-List: contact dev-help@asterixdb.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@asterixdb.incubator.apache.org Delivered-To: mailing list dev@asterixdb.incubator.apache.org Received: (qmail 74013 invoked by uid 99); 24 Aug 2015 12:03:18 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 24 Aug 2015 12:03:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 33BDD1AA7B6 for ; Mon, 24 Aug 2015 12:03:18 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.901 X-Spam-Level: ** X-Spam-Status: No, score=2.901 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-east.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 1P_o8jmxhmM8 for ; Mon, 24 Aug 2015 12:03:10 +0000 (UTC) Received: from mail-oi0-f44.google.com (mail-oi0-f44.google.com [209.85.218.44]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id DC32343E3E for ; Mon, 24 Aug 2015 12:03:09 +0000 (UTC) Received: by oiew67 with SMTP id w67so78664529oie.2 for ; Mon, 24 Aug 2015 05:03:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=Ui+bT7l6FeV94AbtxpQR/V7JXn0cS2dLULYHuCVWaqE=; b=TKwVppgYwpyI2U3vgQJjCrXogRLmAs8Df0/kzDPZ9FJK5s+0IyRzqNok6Qy4T7iWiG ZpTq4oJBdf3BzZVnJMu7RBPoVMFOGNKtcKqRt9B+VWzKzCtjNYGDI4xHrhToUxgqwIvl //dit+xaOhrYzlwyFwTaPzngkEL8upGw78SAzaNH/EqNFAIlqtlaFqqN6uQEmB0lebAV aNS/kctErArJ4tXZNUg6Jpcb/m7i1ngkMmCg3GJZW2t+wU2/ta4KTzZrpFCNoeoG4mG0 m0HGWhd9aPYdico3BDYRIOxz9puivey+c+7AwuUQnP0u4MlMercH4oWQT8DHTBxktjDN YMsw== MIME-Version: 1.0 X-Received: by 10.202.91.7 with SMTP id p7mr18280568oib.41.1440417789140; Mon, 24 Aug 2015 05:03:09 -0700 (PDT) Received: by 10.76.21.230 with HTTP; Mon, 24 Aug 2015 05:03:09 -0700 (PDT) In-Reply-To: References: Date: Mon, 24 Aug 2015 15:03:09 +0300 Message-ID: Subject: Re: The solution to the sporadic connection refused exceptions From: abdullah alamoudi To: dev@asterixdb.incubator.apache.org Content-Type: multipart/alternative; boundary=001a113caba4a66941051e0d6686 --001a113caba4a66941051e0d6686 Content-Type: text/plain; charset=UTF-8 Now that I think about it. Maybe we should provide multiple ways to do this. A polling mechanism to be used for arbitrary time and a pushing mechanism on startup. I am going to start implementation of this and will probably use RMI for this task both ways (CC to InstallerDriver and InstallerDriver to CC). Cheers, Abdullah. On Mon, Aug 24, 2015 at 2:19 PM, abdullah alamoudi wrote: > So after further investigation, turned out our startup process just starts > the CC and NC processes and then make sure the processes are running and if > the processes were found to be running, it returns the state of the cluster > to be active and the subsequent test commands can start immediately. > > This means that the CC could've started but is not yet ready when we try > to process the next command. To address this, we need a better way to tell > when the startup procedure has completed. we can do this by pushing (CC > informs installer driver when the startup is complete) or polling (The > installer driver needs to actually query the CC for the state of the > cluster). > > I can do either way so let's vote. My vote goes to the pushing mechanism. > Thoughts? > > On Mon, Aug 24, 2015 at 10:15 AM, abdullah alamoudi > wrote: > >> This solution turned out to be incorrect. Actually, the test cases when I >> build after using the join method never fails but running an actual asterix >> instance never succeeds which is quite confusing. >> >> I also think that the startup script has a major bug where it might >> returns before the startup is complete. More on this later...... >> >> On Mon, Aug 24, 2015 at 7:48 AM, abdullah alamoudi >> wrote: >> >>> It is highly unlikely that it is related. >>> >>> Cheers, >>> Abdullah. >>> >>> On Mon, Aug 24, 2015 at 5:45 AM, Chen Li wrote: >>> >>>> @Abdullah: Is this issue related to >>>> https://issues.apache.org/jira/browse/ASTERIXDB-1074? Ian and I plan to >>>> look into the details on Monday. >>>> >>>> On Sun, Aug 23, 2015 at 10:08 AM, abdullah alamoudi >>> > >>>> wrote: >>>> >>>> > About 3-4 days ago, I was working on the addition of the filesystem >>>> based >>>> > feed adapter and it didn't take anytime to complete. However, when I >>>> wanted >>>> > to build and make sure all tests pass, I kept getting >>>> ConnectionRefused >>>> > errors which caused the installer tests to fail every now and then. >>>> > >>>> > I knew the new change had nothing to do with this failure, yet, I >>>> couldn't >>>> > direct my attention away from this bug (It just bothered me so much >>>> and I >>>> > knew it needs to be resolved ASAP). After wasting countless hours, I >>>> was >>>> > finally able to figure out what was happening :-) >>>> > >>>> > In the startup routine, we start three Jetty web servers (Web >>>> interface >>>> > server, JSON API server, and Feed server). Sometime ago, we used to >>>> end the >>>> > startup call before making sure the server.isStarted() method returns >>>> true >>>> > on all servers. At that time, I introduced the waitUntilServerStarts >>>> method >>>> > to make sure we don't return before the servers are ready. Turned >>>> out, that >>>> > was an incorrect way to handle this (We can blame stackoverflow for >>>> this >>>> > one!) and it is not enough that the server isStarted() returns true. >>>> The >>>> > correct way to do this is to call the server.join() method after the >>>> > server.start(). >>>> > >>>> > See: >>>> > >>>> http://stackoverflow.com/questions/15924874/embedded-jetty-why-to-use-join >>>> > >>>> > This was equally satisfying as it was frustrating and you are welcome >>>> for >>>> > the future time I saved each of you :) >>>> > -- >>>> > Amoudi, Abdullah. >>>> > >>>> >>> >>> >>> >>> -- >>> Amoudi, Abdullah. >>> >> >> >> >> -- >> Amoudi, Abdullah. >> > > > > -- > Amoudi, Abdullah. > -- Amoudi, Abdullah. --001a113caba4a66941051e0d6686--