Return-Path: X-Original-To: apmail-mahout-user-archive@www.apache.org Delivered-To: apmail-mahout-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 58F8A10910 for ; Wed, 7 Aug 2013 21:49:19 +0000 (UTC) Received: (qmail 21184 invoked by uid 500); 7 Aug 2013 21:49:17 -0000 Delivered-To: apmail-mahout-user-archive@mahout.apache.org Received: (qmail 21139 invoked by uid 500); 7 Aug 2013 21:49:17 -0000 Mailing-List: contact user-help@mahout.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@mahout.apache.org Delivered-To: mailing list user@mahout.apache.org Received: (qmail 21131 invoked by uid 99); 7 Aug 2013 21:49:17 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Aug 2013 21:49:17 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=FREEMAIL_REPLY,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.223.180 as permitted sender) Received: from [209.85.223.180] (HELO mail-ie0-f180.google.com) (209.85.223.180) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 07 Aug 2013 21:49:11 +0000 Received: by mail-ie0-f180.google.com with SMTP id aq17so468425iec.25 for ; Wed, 07 Aug 2013 14:48:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=khhicAxwAo5ZeAptpFelHaI+9Fnc3EMxgeoThgn2vCA=; b=DDvZH/1j1KeHtzA//NwjmAgTqXr0GaAtJC6f/kSQJjpjGzWNYSH7xVMoI6giwzVWXq XcbCQSidNeswZvELZ5vZnMZrPLUxiKcUkT34BFb5PfYm2Aoydkg/XoOrIb0DkScxC9ib oO+D4O01CNeAi9txh9yClWPuHEvBH4Guap1EwPKCarDGbsmSuIur19WraQTsRuBgZVxe fmJitf3LzwWnIQc8MkN5eID3ITYuzre0aAjKIpfw/tsxW3pyIlBz9leK+YYE1HcKGx9G gWRCCykA7tgVoZuaFT9go3dPg6bhygrjk/20Z+XkzzV5vD0r7wsKEotlRNtoOaH752lQ mkgg== X-Received: by 10.50.97.2 with SMTP id dw2mr646530igb.18.1375912130148; Wed, 07 Aug 2013 14:48:50 -0700 (PDT) MIME-Version: 1.0 Received: by 10.64.129.200 with HTTP; Wed, 7 Aug 2013 14:48:20 -0700 (PDT) In-Reply-To: <1375907162.34575.YahooMailNeo@web126003.mail.ne1.yahoo.com> References: <1375907162.34575.YahooMailNeo@web126003.mail.ne1.yahoo.com> From: Ted Dunning Date: Wed, 7 Aug 2013 14:48:20 -0700 Message-ID: Subject: Re: Is OnlineSummarizer mergeable? To: "user@mahout.apache.org" , Otis Gospodnetic Content-Type: multipart/alternative; boundary=047d7b10cf17c280cb04e36281fe X-Virus-Checked: Checked by ClamAV on apache.org --047d7b10cf17c280cb04e36281fe Content-Type: text/plain; charset=UTF-8 Otis, What statistics do you need? What guarantees? On Wed, Aug 7, 2013 at 1:26 PM, Otis Gospodnetic wrote: > Hi Ted, > > I'm actually trying to find an alternative to QDigest (the stream-lib impl > specifically) because even though it seems good, we have to deal with crazy > volumes of data in SPM (performance monitoring service, see signature)... > I'm hoping we can find something that has both a lower memory footprint > than QDigest AND that is mergeable a la QDigest. Utopia? > > Thanks, > Otis > ---- > Performance Monitoring for Solr / ElasticSearch / Hadoop / HBase - > http://sematext.com/spm > > > > > >________________________________ > > From: Ted Dunning > >To: "user@mahout.apache.org" > >Sent: Wednesday, August 7, 2013 4:51 PM > >Subject: Re: Is OnlineSummarizer mergeable? > > > > > >It isn't as mergeable as I would like. If you have randomized record > >selection, it should be possible, but perverse ordering can cause serious > >errors. > > > >It would be better to use something like a Q-digest. > > > >http://www.cs.virginia.edu/~son/cs851/papers/ucsb.sensys04.pdf > > > > > > > > > >On Wed, Aug 7, 2013 at 4:21 AM, Otis Gospodnetic < > otis.gospodnetic@gmail.com > >> wrote: > > > >> Hi, > >> > >> Is OnlineSummarizer algo "mergeable"? > >> > >> Say that we compute a percentile for some metric for time 12:00-12:01 > >> and store that somewhere, then we compute it for 1201-12:02 and store > >> that separately, and so on. > >> > >> Can we then later merge these computed and previously stored > >> percentile "instances" and get an accurate value? > >> > >> Thanks, > >> Otis > >> -- > >> Performance Monitoring -- http://sematext.com/spm > >> Solr & ElasticSearch Support -- http://sematext.com/ > >> > > > > > > --047d7b10cf17c280cb04e36281fe--