incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Keith Gable <zi...@ignition-project.com>
Subject Re: Distinct values with range
Date Tue, 16 Apr 2013 03:30:05 GMT
It gives you distinct countries per day. Is that not what you want? With
reduce, it should be really fast once the view is built.
On Apr 15, 2013 9:05 PM, "Andrey Kuprianov" <andrey.kouprianov@gmail.com>
wrote:

> @Keith your method will not give me distinct countries and even with reduce
> and after being fed to list function it's still slow
>
>
>
> On Tue, Apr 16, 2013 at 2:27 AM, Wendall Cada <wendallc@apache.org> wrote:
>
> > I agree with this approach. I do something similar using _sum:
> >
> > emit([doc.country_name, toDay(doc.timestamp)], 1);
> >
> > The toDay() method is basically a floor of the day value. Since I don't
> > store ts in UTC (Because of an idiotic error some years back) I also do a
> > tz offset to correct the day value in my toDay() method.
> >
> > Using reduce is by far the fastest method for this. I don't see any issue
> > with getting this to scale.
> >
> > Overall, I think I rather prefer the method Keith shows, as it would
> > depend on the values returned in the date object versus other possibly
> > inaccurate means using math.
> >
> > Wendall
> >
> >
> > On 04/15/2013 07:18 AM, Keith Gable wrote:
> >
> >> Output keys like so:
> >>
> >> [2010, 7, 10, "Australia"]
> >>
> >> Reduce function would be _count.
> >>
> >> startkey=[year,month,day,null]
> >> endkey=[year,month,day,{}]
> >>
> >> ---
> >> Keith Gable
> >> A+, Network+, and Storage+ Certified Professional
> >> Apple Certified Technical Coordinator
> >> Mobile Application Developer / Web Developer
> >>
> >>
> >> On Sun, Apr 14, 2013 at 8:37 PM, Andrey Kuprianov <
> >> andrey.kouprianov@gmail.com> wrote:
> >>
> >>  Hi guys,
> >>>
> >>> Just for the sake of a debate. Here's the question. There are
> >>> transactions.
> >>> Among all other attributes there's timestamp (when transaction was
> made;
> >>> in
> >>> seconds) and a country name  (from where the transaction was made). So,
> >>> for
> >>> instance,
> >>>
> >>> {
> >>>      . . . .
> >>>      "timestamp": 1332806400
> >>>      "country_name": "Australia",
> >>>      . . . .
> >>> }
> >>>
> >>> Question is: how does one get unique / distinct country names in
> between
> >>> dates? For example, give me all country names in between 10-Jul-2010
> and
> >>> 21-Jan-2013.
> >>>
> >>> My solution was to write a custom reduce function and set
> >>> reduce_limit=false, so that i can enumerate all countries without
> hitting
> >>> the overflow exception. It works great! However, such solutions are
> >>> frowned
> >>> upon by everyone around. Has anyone a better idea on how to tackle this
> >>> efficiently?
> >>>
> >>>      Andrey
> >>>
> >>>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message