Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id AA07F2009EE for ; Wed, 18 May 2016 19:19:17 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id A89A4160A00; Wed, 18 May 2016 17:19:17 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id F2F141609B0 for ; Wed, 18 May 2016 19:19:16 +0200 (CEST) Received: (qmail 31096 invoked by uid 500); 18 May 2016 17:19:16 -0000 Mailing-List: contact dev-help@reef.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@reef.apache.org Delivered-To: mailing list dev@reef.apache.org Received: (qmail 31076 invoked by uid 99); 18 May 2016 17:19:15 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2016 17:19:15 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 85195C2E01 for ; Wed, 18 May 2016 17:19:15 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.429 X-Spam-Level: * X-Spam-Status: No, score=1.429 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx2-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id lnX1SCx5eRdo for ; Wed, 18 May 2016 17:19:13 +0000 (UTC) Received: from mail-ig0-f177.google.com (mail-ig0-f177.google.com [209.85.213.177]) by mx2-lw-eu.apache.org (ASF Mail Server at mx2-lw-eu.apache.org) with ESMTPS id B5BD15F4EE for ; Wed, 18 May 2016 17:19:12 +0000 (UTC) Received: by mail-ig0-f177.google.com with SMTP id m9so31091305ige.1 for ; Wed, 18 May 2016 10:19:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=JKtkH5EtKvB1GqCqLrms+LGLsetbgvOY4y/CMJxiGI8=; b=pkEAJCXLe+fGOiRSmNM1pLdfHqN3iLRep0e42rt5Y2cOvBdWilYE/xpmQes/qUcUjq iDh2u/+Ry7rdLWVFtWdJasZTTqJxhtn2feooXX6lrSDqZ8K/l2szdFTgZbNYaHALdc20 UVV1LJ//mX2vepKg/R6IiqqXOWolEXF30rOb0Va1jyRymu5pjRFkH67iCPPG+Exfph1n dCwEaL5xsSGRXG3r02uow+jzml1zGEpdXn2bqvUVF19FLI4+UM/bwMYEMlhm3Ogk7y7R 16d5DXKO9jaO1lNgjTbeR/yG6lUSEcqxjnKhI9QnDJ4mMQO1q9B5Is3F6oQ2sOnsohcc JohA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=JKtkH5EtKvB1GqCqLrms+LGLsetbgvOY4y/CMJxiGI8=; b=KWAUWhWwCARnr8VV1AjGzwJ5LcS17Xq/qNITQCyER4jf64+Tt/+SVeVxeuCui+PRUt z2XPL8pUrpp6zEl4QbqQzw9O+wv57pBHQNUJ2LHXXMU29qe6Ybv1Xw1PJ1VSNgwUYGRq lEdOfvedevoiGRjo706IZ9Olf0MAAmrQzRGojV4wbwAle64bJ27J6Jhkhs8ilui4g+pd 2OQ9/dWMdsbwnZMJunoI4XtyxLPA2ldrcjdwidbJy/2OO8/Y6qu3XtS8cqZPRiqaMsLi 5FJkHwugx3dNEe5bXKzUWMH7SOupaA/FosFrEphJ7Hib6R4YwnWCpYInf/0ukaMb6svC zxmw== X-Gm-Message-State: AOPr4FUtbFbdIDLMU5XAyz1DxSqF+npv0MmBgJP1BS29CzNR9JnoI/icYl/rSwQVXBZHjuiNWM5mFzxxa1BQpw== X-Received: by 10.50.28.113 with SMTP id a17mr6175670igh.44.1463591944638; Wed, 18 May 2016 10:19:04 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.87.197 with HTTP; Wed, 18 May 2016 10:18:24 -0700 (PDT) In-Reply-To: References: From: Andrew Chung Date: Wed, 18 May 2016 10:18:24 -0700 Message-ID: Subject: Re: Test failure in master To: dev@reef.apache.org Content-Type: multipart/alternative; boundary=089e01538ac8f4e29b0533210d3d archived-at: Wed, 18 May 2016 17:19:17 -0000 --089e01538ac8f4e29b0533210d3d Content-Type: text/plain; charset=UTF-8 Hi, Yes, that is the main issue. Right now when checking for idleness, we were only checking the `ThreadPool` queue, but we never check whether the EventHandler is still *active* or not. Thus, if we call `Thread.sleep` within the `FailedEvaluatorHandler`, the `EvaluatorManager` will check the `ThreadPool`, which will report that it is idle (as the `ThreadPool` queue is now empty), and shut down. This might be fine in Java because we didn't have to go through the InterOp layer and the EventHandler always finishes before an idleness check, but this shows when going through the C# code. There are trickier issues going on, namely when the following events occur: 1. We call close on an Evaluator, in *any* part of the code. 2. close triggers an idleness check, but with fix to REEF-1393, the EventHandler is still active. 3. No more calls to check idleness for a while. 4. We don't exit before the Test times out. I'm still currently running tests and checking different scenarios, but a fix (albeit ugly and potentially resource consuming) to run a Thread at the end of `close` that repeatedly checks that all Evaluator messages are handled before triggering an idleness check *should* work. Please let me know if a better fix exists. Thanks, Andrew On Tue, May 17, 2016 at 5:56 PM, Markus Weimer wrote: > Andrew, is REEF-1393 the root cause / fix for this? > > Markus > --089e01538ac8f4e29b0533210d3d--