Return-Path: Delivered-To: apmail-commons-user-archive@www.apache.org Received: (qmail 57577 invoked from network); 15 Mar 2011 00:19:09 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 15 Mar 2011 00:19:09 -0000 Received: (qmail 9276 invoked by uid 500); 15 Mar 2011 00:19:07 -0000 Delivered-To: apmail-commons-user-archive@commons.apache.org Received: (qmail 9199 invoked by uid 500); 15 Mar 2011 00:19:07 -0000 Mailing-List: contact user-help@commons.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: "Commons Users List" Delivered-To: mailing list user@commons.apache.org Received: (qmail 9191 invoked by uid 99); 15 Mar 2011 00:19:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Mar 2011 00:19:07 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of sebbaz@gmail.com designates 209.85.216.178 as permitted sender) Received: from [209.85.216.178] (HELO mail-qy0-f178.google.com) (209.85.216.178) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 15 Mar 2011 00:19:01 +0000 Received: by qyk2 with SMTP id 2so51428qyk.9 for ; Mon, 14 Mar 2011 17:18:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=S8HOmKjPEnGpIzi53u0JGjs1qMG7sRiq+h6zRf1+Fwc=; b=VVBCmpq7+3avx0YHiELFayImQCqePnaS09YBH+pMTAgB4REBl9eCiLOu8smUMe0H8H icQzsX32MCFEmfeRAxyX52w4sg6sQJb4hC1LA5yXNfbCG+4/pSxkYx26ZRuXGkvBk/mT fdktmjdY0zNcflyujroeRfDgchKkl2rLSODic= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=a4E85j0hftHHzTjA5cedc6RHT95I9umaisNgBPpmnKNl6nM0EVnNZO649H/HZHVf7b smsAAeh/0faQxJMawZ+PFMsQpCKItW/KeJQYO6mmoFKmpFVYH0IZ6tPiNQ8aktAlUgeS qpkx1h58QWtUFaK7ycAntSf2usII+qkoPIDrg= MIME-Version: 1.0 Received: by 10.229.225.199 with SMTP id it7mr10398819qcb.188.1300148274303; Mon, 14 Mar 2011 17:17:54 -0700 (PDT) Received: by 10.229.24.69 with HTTP; Mon, 14 Mar 2011 17:17:54 -0700 (PDT) In-Reply-To: <4D7E7DBD.1090803@gmail.com> References: <4D7E6DAA.50306@free.fr> <4D7E7DBD.1090803@gmail.com> Date: Tue, 15 Mar 2011 00:17:54 +0000 Message-ID: Subject: Re: [math] Re: running average of a rate From: sebb To: Commons Users List Cc: Phil Steitz Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 14 March 2011 20:42, Phil Steitz wrote: > On 3/14/11 12:34 PM, Luc Maisonobe wrote: >> Le 14/03/2011 15:33, Benson Margulies a =E9crit : >>> Please excuse the following ignorant question. >>> >>> I want to maintain summary statistics of a rate. At each 'event', I >>> know the number of characters and the time it took to process them, >>> and I want to maintain summary statistics for the rate of >>> chars/second. I imagine that I'm missing something basic, but I don't >>> see how to do this. >> You should define some windows width, either in terms of a time span >> (all events in the last n seconds) or in terms of number of events (last >> n events). >> >> In [math], we do not provide (yet) anything for maintaining such a data >> structure, you'll have to maintain the events in this slot by yourself, >> with something similar to a FIFO. > > I am not sure I understand what the problem is exactly, but if what > you need is simply "rolling" statistics, where a dataset of 0,...,n > values are maintained with the newest values replacing the oldest, > we do in fact support that in > o.a.c.math.descriptive.DescriptiveStatistics. In JMeter we needed to display long running percentiles without using excess memory, and someone came up with the idea of using buckets for ranges of values. So instead of keeping details on each sample elapsed time, we increment the count for the appropriate bucket. If the range of values is too large to use a single bucket for each value, each bucket can represent a range of values. These ranges can potentially be non-uniform though that does complicate the calculations. JMeter actually uses a TreeMap for the values and counts - the values need to be sorted in order to calculate percentiles. Depending on the data-set, it might be possible to used fixed arrays instead of the TreeMap. > Phil >> When you have your data available, each time a new event is added or >> removed from the ones that belong to the window, you can fetch compute >> the statistics you want on this data (min, max, mean, median, standard >> deviation ...) and wait for next addition/removel to recompute it again. >> >> Another thing we discussed some months ago (but did not implement yet) >> is a way to compute an approximation of percentiles in a flow of data >> without storing them. There is an interesting algorithm for it that was >> developed for the needs of telecommunication companies, I think it may >> be of interest to you. This would provide results like : currently 95% >> of the characters are processed in n milliseconds. would you be >> interested in us implementing this feature ? >> >> Luc --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscribe@commons.apache.org For additional commands, e-mail: user-help@commons.apache.org