Return-Path: X-Original-To: apmail-activemq-users-archive@www.apache.org Delivered-To: apmail-activemq-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 16A9C185F1 for ; Tue, 20 Oct 2015 02:16:19 +0000 (UTC) Received: (qmail 64160 invoked by uid 500); 20 Oct 2015 02:16:18 -0000 Delivered-To: apmail-activemq-users-archive@activemq.apache.org Received: (qmail 63989 invoked by uid 500); 20 Oct 2015 02:16:18 -0000 Mailing-List: contact users-help@activemq.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@activemq.apache.org Delivered-To: mailing list users@activemq.apache.org Received: (qmail 63977 invoked by uid 99); 20 Oct 2015 02:16:18 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 20 Oct 2015 02:16:18 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id C54EF1A23EC for ; Tue, 20 Oct 2015 02:16:17 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 3.258 X-Spam-Level: *** X-Spam-Status: No, score=3.258 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, HEADER_FROM_DIFFERENT_DOMAINS=0.008, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id p8Fn_Syrmo0S for ; Tue, 20 Oct 2015 02:16:08 +0000 (UTC) Received: from mail-vk0-f49.google.com (mail-vk0-f49.google.com [209.85.213.49]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 6EB772074E for ; Tue, 20 Oct 2015 02:16:07 +0000 (UTC) Received: by vkex70 with SMTP id x70so2041842vke.3 for ; Mon, 19 Oct 2015 19:16:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:from:date:message-id:subject:to:content-type; bh=aMuxS4GaCR1WkJhsD3RU7UheHnaScAiMEwBlyhw6Byo=; b=hqI8clRGkKz85LNjQnqxAxhx3NTh+ZKfeZd6bOANTWziw3au5bw1Hs9DnBTpbuacgz ZRWAtSkFdtjDMn7Tk3/IoOfJNHmJF1/X9mel+Ujt9HebrioIY2bQ5wBk1cAX2aRESZ9E Qr5kPJ/QlqzNKbENnDn7ZDqJXXHZxkB2vQcBGKoja3iLmn3x03GnnVPjv/7nJ0D3Ogrz XW7gelKumbhd6qbSIuKmu21QKv5sOPFTk+afplur5DdycHpYWqHYW3Zqf3sHQfAoCW/F pbIB7lKg70Eoo7EBcfrqnGYNGnn34zzFlSqda8TPzLjSO1dcDDZLCB63oG8ThfNNJmYQ EG+Q== X-Received: by 10.31.108.74 with SMTP id h71mr323614vkc.57.1445307360256; Mon, 19 Oct 2015 19:16:00 -0700 (PDT) MIME-Version: 1.0 Sender: burtonator2011@gmail.com Received: by 10.31.96.139 with HTTP; Mon, 19 Oct 2015 19:15:40 -0700 (PDT) From: Kevin Burton Date: Mon, 19 Oct 2015 19:15:40 -0700 X-Google-Sender-Auth: rvAt-t1ntV7qrPeZB0OEiNBYTKg Message-ID: Subject: Dealing with the "over-prefetch" problem with large numbers of workers and many queue servers To: users@activemq.apache.org Content-Type: multipart/alternative; boundary=001a114793c2cca8c605227fd701 --001a114793c2cca8c605227fd701 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable We have a problem whereby we have a LARGE number of workers. Right now about 50k worker threads on about 45 bare metal boxes. We have about 10 ActiveMQ servers / daemons which service these workers. The problem is that my current design has a session per queue server per thread. So this means I have about 500k sessions each trying to prefetch 1 message at a time. Since my tasks can take about 30 seconds on average to execute, this means that it takes 5 minutes for a message to be processed. That's a BIG problem in that I want to keep my latencies low! And the BIG downside here is that a lot of my workers get their prefetch buffer filled first, starving out other workers that do nothing... This leads to massive starvation where some of my boxes are at 100% CPU and others are at 10-20% starved for work. So I'm working on a new design where by I use a listener, then I allow it to prefetch and I use a countdown latch from within the message listener to wait for the thread to process the message. Then I commit the message. This solves the over-prefetch problem because we don't attempt to pre-fetch until the message is processed. Since I can't commit each JMS message one at a time, I'm only left with options that commit the whole session. This forces me to set prefetch=3D1 otherwise I could commit() and then commit a message that is actually still being processed. This leaves me with a situation where I need to be clever about how I fetch from the queue servers. If I prefetch on ALL queue servers I'm kind of back to where I was to begin with. I was thinking of implementing this solution which should work and minimizes all downsides. Wanted feedback on this issue. If I have say 1000 worker threads, what I do is allow up to 10% of the nr of worker threads to be pre-fetched and stored in a local queue (ArrayBlockingQueue). In this example this would be 100 messages. The problem now is how to we read in parallel from each server. I think in this situation is that we then allow 10% of the buffered messages from each queue server. So in this case 10 from each. so now we end up with a situation where we're allowed to prefetch 10 messages, each from each queue server, which can grow to hold 100 message. The latency for processing this message would be the minimum average time per task /thread being indexed which I think will keep the latencies low. Also, I think this could be a common anti-pattern and solution to the over-prefetch problem. If you agree I'm willing to document the problem Additionally, I think this comes close to the multi-headed ideal solution according to queuing theory using multiple worker heads. It just becomes more interesting because we have imperfect information from the queue servers so we have to make educated guesses about their behavior. --=20 We=E2=80=99re hiring if you know of any awesome Java Devops or Linux Operat= ions Engineers! Founder/CEO Spinn3r.com Location: *San Francisco, CA* blog: http://burtonator.wordpress.com =E2=80=A6 or check out my Google+ profile --001a114793c2cca8c605227fd701--