couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Kuprianov <andrey.koupria...@gmail.com>
Subject Re: Distinct values with range
Date Tue, 16 Apr 2013 02:04:43 GMT
@Keith your method will not give me distinct countries and even with reduce
and after being fed to list function it's still slow



On Tue, Apr 16, 2013 at 2:27 AM, Wendall Cada <wendallc@apache.org> wrote:

> I agree with this approach. I do something similar using _sum:
>
> emit([doc.country_name, toDay(doc.timestamp)], 1);
>
> The toDay() method is basically a floor of the day value. Since I don't
> store ts in UTC (Because of an idiotic error some years back) I also do a
> tz offset to correct the day value in my toDay() method.
>
> Using reduce is by far the fastest method for this. I don't see any issue
> with getting this to scale.
>
> Overall, I think I rather prefer the method Keith shows, as it would
> depend on the values returned in the date object versus other possibly
> inaccurate means using math.
>
> Wendall
>
>
> On 04/15/2013 07:18 AM, Keith Gable wrote:
>
>> Output keys like so:
>>
>> [2010, 7, 10, "Australia"]
>>
>> Reduce function would be _count.
>>
>> startkey=[year,month,day,null]
>> endkey=[year,month,day,{}]
>>
>> ---
>> Keith Gable
>> A+, Network+, and Storage+ Certified Professional
>> Apple Certified Technical Coordinator
>> Mobile Application Developer / Web Developer
>>
>>
>> On Sun, Apr 14, 2013 at 8:37 PM, Andrey Kuprianov <
>> andrey.kouprianov@gmail.com> wrote:
>>
>>  Hi guys,
>>>
>>> Just for the sake of a debate. Here's the question. There are
>>> transactions.
>>> Among all other attributes there's timestamp (when transaction was made;
>>> in
>>> seconds) and a country name  (from where the transaction was made). So,
>>> for
>>> instance,
>>>
>>> {
>>>      . . . .
>>>      "timestamp": 1332806400
>>>      "country_name": "Australia",
>>>      . . . .
>>> }
>>>
>>> Question is: how does one get unique / distinct country names in between
>>> dates? For example, give me all country names in between 10-Jul-2010 and
>>> 21-Jan-2013.
>>>
>>> My solution was to write a custom reduce function and set
>>> reduce_limit=false, so that i can enumerate all countries without hitting
>>> the overflow exception. It works great! However, such solutions are
>>> frowned
>>> upon by everyone around. Has anyone a better idea on how to tackle this
>>> efficiently?
>>>
>>>      Andrey
>>>
>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message