Return-Path: X-Original-To: apmail-couchdb-user-archive@www.apache.org Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 58874FE51 for ; Tue, 16 Apr 2013 02:05:31 +0000 (UTC) Received: (qmail 18767 invoked by uid 500); 16 Apr 2013 02:05:29 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 18736 invoked by uid 500); 16 Apr 2013 02:05:29 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 18727 invoked by uid 99); 16 Apr 2013 02:05:29 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Apr 2013 02:05:29 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of andrey.kouprianov@gmail.com designates 209.85.220.182 as permitted sender) Received: from [209.85.220.182] (HELO mail-vc0-f182.google.com) (209.85.220.182) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Apr 2013 02:05:24 +0000 Received: by mail-vc0-f182.google.com with SMTP id ht10so3638vcb.41 for ; Mon, 15 Apr 2013 19:05:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:content-type; bh=LQbcw7P9EGNWajhQjO+8XLG07bgKrzsf9TICEQA8444=; b=KkjkY2YabqlDr+To/yCwPnhGTGu9jsxz+LTa+tCVYYkE2ZYO+Tf7G6+lpsL46kgJyy G9LbBYuf/4IZczzGsZtTzLyVQxoBJg/7sdN+z9hdV2ET0WexVX5MotGPnBTcpr+3OD5H WGD9oa+oo737ZXx1oo68f3UpbrNjgjhMkS50yN5vUZtq3Opz4avYqehKuck539tnlO/a Mimldk74kiHTBLUBT/sq+WC4kw8056X7TLOpj+Mb7Pu4WFZhQPjss0vhGVwpAfG4+7wz 3VMaXR5ExPEzmwpvDW8u2b3Wc3LSw+47r9XpaWPe3s6qLsHHndifaSKcta4308aUPivq 3E3g== X-Received: by 10.52.178.161 with SMTP id cz1mr186671vdc.7.1366077903669; Mon, 15 Apr 2013 19:05:03 -0700 (PDT) MIME-Version: 1.0 Received: by 10.58.12.135 with HTTP; Mon, 15 Apr 2013 19:04:43 -0700 (PDT) In-Reply-To: <516C4696.80304@apache.org> References: <516C4696.80304@apache.org> From: Andrey Kuprianov Date: Tue, 16 Apr 2013 10:04:43 +0800 Message-ID: Subject: Re: Distinct values with range To: user@couchdb.apache.org Content-Type: multipart/alternative; boundary=bcaec519644b2f20fe04da70ccce X-Virus-Checked: Checked by ClamAV on apache.org --bcaec519644b2f20fe04da70ccce Content-Type: text/plain; charset=ISO-8859-1 @Keith your method will not give me distinct countries and even with reduce and after being fed to list function it's still slow On Tue, Apr 16, 2013 at 2:27 AM, Wendall Cada wrote: > I agree with this approach. I do something similar using _sum: > > emit([doc.country_name, toDay(doc.timestamp)], 1); > > The toDay() method is basically a floor of the day value. Since I don't > store ts in UTC (Because of an idiotic error some years back) I also do a > tz offset to correct the day value in my toDay() method. > > Using reduce is by far the fastest method for this. I don't see any issue > with getting this to scale. > > Overall, I think I rather prefer the method Keith shows, as it would > depend on the values returned in the date object versus other possibly > inaccurate means using math. > > Wendall > > > On 04/15/2013 07:18 AM, Keith Gable wrote: > >> Output keys like so: >> >> [2010, 7, 10, "Australia"] >> >> Reduce function would be _count. >> >> startkey=[year,month,day,null] >> endkey=[year,month,day,{}] >> >> --- >> Keith Gable >> A+, Network+, and Storage+ Certified Professional >> Apple Certified Technical Coordinator >> Mobile Application Developer / Web Developer >> >> >> On Sun, Apr 14, 2013 at 8:37 PM, Andrey Kuprianov < >> andrey.kouprianov@gmail.com> wrote: >> >> Hi guys, >>> >>> Just for the sake of a debate. Here's the question. There are >>> transactions. >>> Among all other attributes there's timestamp (when transaction was made; >>> in >>> seconds) and a country name (from where the transaction was made). So, >>> for >>> instance, >>> >>> { >>> . . . . >>> "timestamp": 1332806400 >>> "country_name": "Australia", >>> . . . . >>> } >>> >>> Question is: how does one get unique / distinct country names in between >>> dates? For example, give me all country names in between 10-Jul-2010 and >>> 21-Jan-2013. >>> >>> My solution was to write a custom reduce function and set >>> reduce_limit=false, so that i can enumerate all countries without hitting >>> the overflow exception. It works great! However, such solutions are >>> frowned >>> upon by everyone around. Has anyone a better idea on how to tackle this >>> efficiently? >>> >>> Andrey >>> >>> > --bcaec519644b2f20fe04da70ccce--