couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From muji <freeformsyst...@gmail.com>
Subject Re: Distinct values with range
Date Tue, 16 Apr 2013 09:42:14 GMT
I believe you need to query with startkey and endkey as complex keys
(assuming YYYY-MM-DD):

startkey=[startyear,startmonth,startday]
endkey=[endyear,endmonth,endday,{}]

Then you can extract the countries from the key returned with each row (it
will be the last element in the array). You will also need to set the group
view parameter (group_level=4?) for distinct values.

Then you should not need to write a custom reduce function.

The startkey and endkey must be proper JSON (and URL) encoded values.

My understanding is that is the correct approach.

Cheers!


On 16 April 2013 05:46, Andrey Kuprianov <andrey.kouprianov@gmail.com>wrote:

> Nope, I need distinct values over a period of time. Not per day.
>
>
> On Tue, Apr 16, 2013 at 11:30 AM, Keith Gable <ziggy@ignition-project.com
> >wrote:
>
> > It gives you distinct countries per day. Is that not what you want? With
> > reduce, it should be really fast once the view is built.
> > On Apr 15, 2013 9:05 PM, "Andrey Kuprianov" <andrey.kouprianov@gmail.com
> >
> > wrote:
> >
> > > @Keith your method will not give me distinct countries and even with
> > reduce
> > > and after being fed to list function it's still slow
> > >
> > >
> > >
> > > On Tue, Apr 16, 2013 at 2:27 AM, Wendall Cada <wendallc@apache.org>
> > wrote:
> > >
> > > > I agree with this approach. I do something similar using _sum:
> > > >
> > > > emit([doc.country_name, toDay(doc.timestamp)], 1);
> > > >
> > > > The toDay() method is basically a floor of the day value. Since I
> don't
> > > > store ts in UTC (Because of an idiotic error some years back) I also
> > do a
> > > > tz offset to correct the day value in my toDay() method.
> > > >
> > > > Using reduce is by far the fastest method for this. I don't see any
> > issue
> > > > with getting this to scale.
> > > >
> > > > Overall, I think I rather prefer the method Keith shows, as it would
> > > > depend on the values returned in the date object versus other
> possibly
> > > > inaccurate means using math.
> > > >
> > > > Wendall
> > > >
> > > >
> > > > On 04/15/2013 07:18 AM, Keith Gable wrote:
> > > >
> > > >> Output keys like so:
> > > >>
> > > >> [2010, 7, 10, "Australia"]
> > > >>
> > > >> Reduce function would be _count.
> > > >>
> > > >> startkey=[year,month,day,null]
> > > >> endkey=[year,month,day,{}]
> > > >>
> > > >> ---
> > > >> Keith Gable
> > > >> A+, Network+, and Storage+ Certified Professional
> > > >> Apple Certified Technical Coordinator
> > > >> Mobile Application Developer / Web Developer
> > > >>
> > > >>
> > > >> On Sun, Apr 14, 2013 at 8:37 PM, Andrey Kuprianov <
> > > >> andrey.kouprianov@gmail.com> wrote:
> > > >>
> > > >>  Hi guys,
> > > >>>
> > > >>> Just for the sake of a debate. Here's the question. There are
> > > >>> transactions.
> > > >>> Among all other attributes there's timestamp (when transaction
was
> > > made;
> > > >>> in
> > > >>> seconds) and a country name  (from where the transaction was made).
> > So,
> > > >>> for
> > > >>> instance,
> > > >>>
> > > >>> {
> > > >>>      . . . .
> > > >>>      "timestamp": 1332806400
> > > >>>      "country_name": "Australia",
> > > >>>      . . . .
> > > >>> }
> > > >>>
> > > >>> Question is: how does one get unique / distinct country names
in
> > > between
> > > >>> dates? For example, give me all country names in between
> 10-Jul-2010
> > > and
> > > >>> 21-Jan-2013.
> > > >>>
> > > >>> My solution was to write a custom reduce function and set
> > > >>> reduce_limit=false, so that i can enumerate all countries without
> > > hitting
> > > >>> the overflow exception. It works great! However, such solutions
are
> > > >>> frowned
> > > >>> upon by everyone around. Has anyone a better idea on how to tackle
> > this
> > > >>> efficiently?
> > > >>>
> > > >>>      Andrey
> > > >>>
> > > >>>
> > > >
> > >
> >
>



-- 
mischa (aka muji).

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message