From dev-return-17185-archive-asf-public=cust-asf.ponee.io@nifi.apache.org Wed May 9 10:05:13 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 64498180649 for ; Wed, 9 May 2018 10:05:13 +0200 (CEST) Received: (qmail 20145 invoked by uid 500); 9 May 2018 08:05:12 -0000 Mailing-List: contact dev-help@nifi.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@nifi.apache.org Delivered-To: mailing list dev@nifi.apache.org Received: (qmail 20129 invoked by uid 99); 9 May 2018 08:05:11 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 09 May 2018 08:05:11 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id E44B2CC312 for ; Wed, 9 May 2018 08:05:10 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.869 X-Spam-Level: * X-Spam-Status: No, score=1.869 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id yQD_CxiF_8Wd for ; Wed, 9 May 2018 08:05:08 +0000 (UTC) Received: from mail-qt0-f180.google.com (mail-qt0-f180.google.com [209.85.216.180]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id E06AA5F666 for ; Wed, 9 May 2018 08:05:07 +0000 (UTC) Received: by mail-qt0-f180.google.com with SMTP id f13-v6so34744723qtp.10 for ; Wed, 09 May 2018 01:05:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=ji8GT7bsdTBTLd4wM9x/vFKlopo2XtJL1QIAyKGklk4=; b=gqaS16B/ttvgR8oeH9f70le7c0dutmw8ZeRi8G1oyxWMlVp/cqZBXlTZrrjYHBPsLV tHnqe01ZqXoji2OiuKql/ymMfUE2+XtwcUTx2VCtzshxg1mY94p9x7qZcnezOTcs7l80 qb5dbdIY7L3MG31rZSihjGdO77XWEKR560QJn1Dhco+avEATgM67O2KrzyR96KiQ+7tq uWNk7lZWt3LmVRe8D/ZtO1svPaovDd+9q91pUTJn7O4o2I+Dy4FNrH9a+IOFTxTO7A3i a1KkRwRyiIi+e6x3DSuikrLy5Lxl+5quRBsHcYeAsq2+5v7n1/ST5XbWHbcnwLxVGK3n vfgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=ji8GT7bsdTBTLd4wM9x/vFKlopo2XtJL1QIAyKGklk4=; b=rj2qo6T5vh7zO6MYIWDMlL90euhWriXf2yTKodxSjkcsP+vgXuREGl4LDfiaphtMeJ eOqVTCn+CMtW4IohapBj84a5tWbLnaYr4T0pN7wdLnWoJgWRmG5cjQ68dk4gbDjEhKHQ WhY9f5LNyUZw4ufVRRADjdAxPX77Ao1K6tPi42vSfp9iUGKKuhX4CNoK5kGWCFDOpKtl 8xEbbpRl5IyspDEw8LIxjY5/efo81AArg5WVsc+7aalz7QXMeXeKIQyZrkHBFRp2YfBW AQaApspmgVF2W6GFKq09CSUxEHhBCNUPGC8eHFkrQPXaLNgE1ZbhVhSsmuLt5rxjRfvP zo6A== X-Gm-Message-State: ALKqPwe3u3Yb7VjzSRUsUJDIbJ31JFlNZj0jhmuIOsBnBO5imoZLt0yI rrr3DmJ+qGkjNWHY6bGJs1conMSM9SMJjcH/fDTsyQ== X-Google-Smtp-Source: AB8JxZogzOCvuZmrYxgXlkS1qvi6641QRASnGmfMI0KOpFPFvl1FNtoADHtoq8o8kXYLf7nQNnWNdPj+mRqFa8g7KMw= X-Received: by 2002:ac8:1881:: with SMTP id s1-v6mr3456808qtj.405.1525853100750; Wed, 09 May 2018 01:05:00 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jeff Date: Wed, 09 May 2018 08:04:50 +0000 Message-ID: Subject: Re: Cluster behavior when heavily loaded To: dev@nifi.apache.org Content-Type: multipart/alternative; boundary="0000000000000cbcfe056bc15ce3" --0000000000000cbcfe056bc15ce3 Content-Type: text/plain; charset="UTF-8" Mark, You could try increasing the "tickTime" setting in zookeeper.properties to give ZK a little more time before timing something out. Ticktime is defined [1] as "the length of a single tick, which is the basic time unit used by ZooKeeper, as measured in milliseconds. It is used to regulate heartbeats, and timeouts. For example, the minimum session timeout will be two ticks". You might want to try a value of 10000 or 15000 in there, making a tick 10 or 15 seconds respectively. There are other settings you might want to look into as well, but when I was testing on heavily loaded clusters, increasing the timeouts for NiFi as you've done above along with increasing ticktime had good results. [1] http://zookeeper.apache.org/doc/r3.4.6/zookeeperAdmin.html On Tue, May 8, 2018 at 10:43 AM Mark Bean wrote: > We have a 6-node cluster using external ZooKeeper. It is heavily loaded, > and we are attempting to tune some of the properties to alleviate some > observed issues. By "heavily loaded" I mean the graph is large (approx. > 3,000 processors) and there is a lot of data in process (approx. 2M > flowfiles/120GB queued) > > One symptom we see is that changes to the graph are not replicated to other > nodes, and the Node(s) are subsequently disconnected from the cluster. In > one example, we see in the nifi-app.log that the node is disconnected due > to "failed to process request PUT > /nifi-api/connection/976a60b5d-3c4e-3bbb-8fbe-4790f3ecb147" > > The following properties are set in nifi.properties: > > nifi.cluster.node.protocol.threads=30 > nifi.cluster.node.protocol.max.threads=50 > nifi.cluster.node.event.history.size=25 > nifi.cluster.node.connection.timeout=60 sec > nifi.cluster.node.read.timeout=60 sec > nifi.cluster.node.max.concurrent.requests=500 > nifi.cluster.node.request.replication.claim.timeout=20 secs > > nifi.zookeeper.connect.timeout=30 secs > nifi.zookeeper.session.timeout=30 secs > > Some of the (timeout) values are set fairly high due to the heavily loaded > system; we allow a longer time to complete tasks. Are there interrelated > properties which a long timeout might actually become detrimental? Are > there other properties we should look at more closely? > > Thanks, > Mark > --0000000000000cbcfe056bc15ce3--