couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Kuprianov <andrey.koupria...@gmail.com>
Subject Re: Distinct values with range
Date Tue, 16 Apr 2013 09:50:22 GMT
Muji, what happens if you have several hundred transactions per day in a
variety of different countries over several years? Then your view
processing is going to be very slow. We are looking for a near real-time
solution


On Tue, Apr 16, 2013 at 5:42 PM, muji <freeformsystems@gmail.com> wrote:

> I believe you need to query with startkey and endkey as complex keys
> (assuming YYYY-MM-DD):
>
> startkey=[startyear,startmonth,startday]
> endkey=[endyear,endmonth,endday,{}]
>
> Then you can extract the countries from the key returned with each row (it
> will be the last element in the array). You will also need to set the group
> view parameter (group_level=4?) for distinct values.
>
> Then you should not need to write a custom reduce function.
>
> The startkey and endkey must be proper JSON (and URL) encoded values.
>
> My understanding is that is the correct approach.
>
> Cheers!
>
>
> On 16 April 2013 05:46, Andrey Kuprianov <andrey.kouprianov@gmail.com
> >wrote:
>
> > Nope, I need distinct values over a period of time. Not per day.
> >
> >
> > On Tue, Apr 16, 2013 at 11:30 AM, Keith Gable <
> ziggy@ignition-project.com
> > >wrote:
> >
> > > It gives you distinct countries per day. Is that not what you want?
> With
> > > reduce, it should be really fast once the view is built.
> > > On Apr 15, 2013 9:05 PM, "Andrey Kuprianov" <
> andrey.kouprianov@gmail.com
> > >
> > > wrote:
> > >
> > > > @Keith your method will not give me distinct countries and even with
> > > reduce
> > > > and after being fed to list function it's still slow
> > > >
> > > >
> > > >
> > > > On Tue, Apr 16, 2013 at 2:27 AM, Wendall Cada <wendallc@apache.org>
> > > wrote:
> > > >
> > > > > I agree with this approach. I do something similar using _sum:
> > > > >
> > > > > emit([doc.country_name, toDay(doc.timestamp)], 1);
> > > > >
> > > > > The toDay() method is basically a floor of the day value. Since I
> > don't
> > > > > store ts in UTC (Because of an idiotic error some years back) I
> also
> > > do a
> > > > > tz offset to correct the day value in my toDay() method.
> > > > >
> > > > > Using reduce is by far the fastest method for this. I don't see any
> > > issue
> > > > > with getting this to scale.
> > > > >
> > > > > Overall, I think I rather prefer the method Keith shows, as it
> would
> > > > > depend on the values returned in the date object versus other
> > possibly
> > > > > inaccurate means using math.
> > > > >
> > > > > Wendall
> > > > >
> > > > >
> > > > > On 04/15/2013 07:18 AM, Keith Gable wrote:
> > > > >
> > > > >> Output keys like so:
> > > > >>
> > > > >> [2010, 7, 10, "Australia"]
> > > > >>
> > > > >> Reduce function would be _count.
> > > > >>
> > > > >> startkey=[year,month,day,null]
> > > > >> endkey=[year,month,day,{}]
> > > > >>
> > > > >> ---
> > > > >> Keith Gable
> > > > >> A+, Network+, and Storage+ Certified Professional
> > > > >> Apple Certified Technical Coordinator
> > > > >> Mobile Application Developer / Web Developer
> > > > >>
> > > > >>
> > > > >> On Sun, Apr 14, 2013 at 8:37 PM, Andrey Kuprianov <
> > > > >> andrey.kouprianov@gmail.com> wrote:
> > > > >>
> > > > >>  Hi guys,
> > > > >>>
> > > > >>> Just for the sake of a debate. Here's the question. There
are
> > > > >>> transactions.
> > > > >>> Among all other attributes there's timestamp (when transaction
> was
> > > > made;
> > > > >>> in
> > > > >>> seconds) and a country name  (from where the transaction
was
> made).
> > > So,
> > > > >>> for
> > > > >>> instance,
> > > > >>>
> > > > >>> {
> > > > >>>      . . . .
> > > > >>>      "timestamp": 1332806400
> > > > >>>      "country_name": "Australia",
> > > > >>>      . . . .
> > > > >>> }
> > > > >>>
> > > > >>> Question is: how does one get unique / distinct country names
in
> > > > between
> > > > >>> dates? For example, give me all country names in between
> > 10-Jul-2010
> > > > and
> > > > >>> 21-Jan-2013.
> > > > >>>
> > > > >>> My solution was to write a custom reduce function and set
> > > > >>> reduce_limit=false, so that i can enumerate all countries
without
> > > > hitting
> > > > >>> the overflow exception. It works great! However, such solutions
> are
> > > > >>> frowned
> > > > >>> upon by everyone around. Has anyone a better idea on how
to
> tackle
> > > this
> > > > >>> efficiently?
> > > > >>>
> > > > >>>      Andrey
> > > > >>>
> > > > >>>
> > > > >
> > > >
> > >
> >
>
>
>
> --
> mischa (aka muji).
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message