couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Kuprianov <andrey.koupria...@gmail.com>
Subject Re: Distinct values with range
Date Tue, 16 Apr 2013 03:46:23 GMT
Nope, I need distinct values over a period of time. Not per day.


On Tue, Apr 16, 2013 at 11:30 AM, Keith Gable <ziggy@ignition-project.com>wrote:

> It gives you distinct countries per day. Is that not what you want? With
> reduce, it should be really fast once the view is built.
> On Apr 15, 2013 9:05 PM, "Andrey Kuprianov" <andrey.kouprianov@gmail.com>
> wrote:
>
> > @Keith your method will not give me distinct countries and even with
> reduce
> > and after being fed to list function it's still slow
> >
> >
> >
> > On Tue, Apr 16, 2013 at 2:27 AM, Wendall Cada <wendallc@apache.org>
> wrote:
> >
> > > I agree with this approach. I do something similar using _sum:
> > >
> > > emit([doc.country_name, toDay(doc.timestamp)], 1);
> > >
> > > The toDay() method is basically a floor of the day value. Since I don't
> > > store ts in UTC (Because of an idiotic error some years back) I also
> do a
> > > tz offset to correct the day value in my toDay() method.
> > >
> > > Using reduce is by far the fastest method for this. I don't see any
> issue
> > > with getting this to scale.
> > >
> > > Overall, I think I rather prefer the method Keith shows, as it would
> > > depend on the values returned in the date object versus other possibly
> > > inaccurate means using math.
> > >
> > > Wendall
> > >
> > >
> > > On 04/15/2013 07:18 AM, Keith Gable wrote:
> > >
> > >> Output keys like so:
> > >>
> > >> [2010, 7, 10, "Australia"]
> > >>
> > >> Reduce function would be _count.
> > >>
> > >> startkey=[year,month,day,null]
> > >> endkey=[year,month,day,{}]
> > >>
> > >> ---
> > >> Keith Gable
> > >> A+, Network+, and Storage+ Certified Professional
> > >> Apple Certified Technical Coordinator
> > >> Mobile Application Developer / Web Developer
> > >>
> > >>
> > >> On Sun, Apr 14, 2013 at 8:37 PM, Andrey Kuprianov <
> > >> andrey.kouprianov@gmail.com> wrote:
> > >>
> > >>  Hi guys,
> > >>>
> > >>> Just for the sake of a debate. Here's the question. There are
> > >>> transactions.
> > >>> Among all other attributes there's timestamp (when transaction was
> > made;
> > >>> in
> > >>> seconds) and a country name  (from where the transaction was made).
> So,
> > >>> for
> > >>> instance,
> > >>>
> > >>> {
> > >>>      . . . .
> > >>>      "timestamp": 1332806400
> > >>>      "country_name": "Australia",
> > >>>      . . . .
> > >>> }
> > >>>
> > >>> Question is: how does one get unique / distinct country names in
> > between
> > >>> dates? For example, give me all country names in between 10-Jul-2010
> > and
> > >>> 21-Jan-2013.
> > >>>
> > >>> My solution was to write a custom reduce function and set
> > >>> reduce_limit=false, so that i can enumerate all countries without
> > hitting
> > >>> the overflow exception. It works great! However, such solutions are
> > >>> frowned
> > >>> upon by everyone around. Has anyone a better idea on how to tackle
> this
> > >>> efficiently?
> > >>>
> > >>>      Andrey
> > >>>
> > >>>
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message