Return-Path: Delivered-To: apmail-couchdb-dev-archive@www.apache.org Received: (qmail 92057 invoked from network); 28 Jun 2009 01:33:12 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 28 Jun 2009 01:33:12 -0000 Received: (qmail 77401 invoked by uid 500); 28 Jun 2009 01:33:22 -0000 Delivered-To: apmail-couchdb-dev-archive@couchdb.apache.org Received: (qmail 77284 invoked by uid 500); 28 Jun 2009 01:33:21 -0000 Mailing-List: contact dev-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@couchdb.apache.org Delivered-To: mailing list dev@couchdb.apache.org Received: (qmail 77262 invoked by uid 99); 28 Jun 2009 01:33:20 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Jun 2009 01:33:20 +0000 X-ASF-Spam-Status: No, hits=-1999.1 required=10.0 tests=ALL_TRUSTED,URIBL_RHS_DOB X-Spam-Check-By: apache.org Received: from [140.211.11.140] (HELO brutus.apache.org) (140.211.11.140) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 28 Jun 2009 01:33:08 +0000 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 405A0234C04C for ; Sat, 27 Jun 2009 18:32:47 -0700 (PDT) Message-ID: <93087344.1246152767262.JavaMail.jira@brutus> Date: Sat, 27 Jun 2009 18:32:47 -0700 (PDT) From: "Paul Joseph Davis (JIRA)" To: dev@couchdb.apache.org Subject: [jira] Updated: (COUCHDB-396) Fixing weirdness in couch_stats_aggregator.erl In-Reply-To: <1862740908.1246152767183.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/COUCHDB-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Joseph Davis updated COUCHDB-396: -------------------------------------- Attachment: couchdb_stats_aggregator.patch > Fixing weirdness in couch_stats_aggregator.erl > ---------------------------------------------- > > Key: COUCHDB-396 > URL: https://issues.apache.org/jira/browse/COUCHDB-396 > Project: CouchDB > Issue Type: Improvement > Components: Database Core, HTTP Interface > Affects Versions: 0.10 > Environment: trunk > Reporter: Paul Joseph Davis > Assignee: Paul Joseph Davis > Fix For: 0.10 > > Attachments: couchdb_stats_aggregator.patch > > > Looking at adding unit tests to the couchdb_stats_aggregator module the other day I realized it was doing some odd calculations. This is a fairly non-trivial patch so I figured that I'd put in JIRA and get feed back before applying. This patch does everything the old version does afaict, but I'll be adding tests before I consider it complete. > List of major changes: > * The old behavior for stats was to integrate incoming values for a time period and then reset the values and start integrating again. That seemed a bit odd so I rewrote things to keep the average and standard deviation for the last N seconds with approximately 1 sample per second. > * Changed request timing calculations [note below] > * Sample periods are configurable in the .ini file. Sample periods of 0 are a special case and integrate all values from couchdb boot up. > * Sample descriptions are in the configuration files now. > * You can request different time periods for the root stats end point. > * Added a sum to the list of statistics > * Simplified some of the external API > The biggest change is in how time for requests are calculated. AFAICT, the old way was accumulating request timings in the stats collector and just adding new values as clock ticks went by as everything else does which makes sense in the case of resetting counters every time period. In the new way I'm keeping a list of the samples in the last time period and when I get a clock tick part of the update is to remove the samples that have passed out of the time period. For a variable like request_time this would lead to unbounded storage. > The new method is calculating the average time of all requests in a single clock tick (1s). One thing this loses is when you start having lots of variability in a single clock tick. Ie, your average request time is 100ms, but 10% of your requests are taking 500ms. I've read of people doing the averaging trick but also storing quantile information as well [1]. There are also algorithms for doing single pass quantile estimation and the like so its possible to do those things in O(N) time. The issue with quantiles is that it'd start breaking the logic of how the collector and aggregators are setup. As it is now, there's basically a one event -> one stat constraint. For the time being I went without quartiles to minimize the impact of the patch. > This code will also be on github [3] as I add patches. > [1] http://code.flickr.com/blog/2008/10/27/counting-timing/ > [2] http://www.slamb.org/svn/repos/trunk/projects/loadtest/benchtools/stats.py (See the QuantileEstimator class) > [3] http://github.com/davisp/couchdb/tree/stats-patch -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.