incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Andrey Kuprianov <andrey.koupria...@gmail.com>
Subject Re: Distinct values with range
Date Mon, 15 Apr 2013 08:29:32 GMT
I feel a little bit deceived here. I was lead to believe that accumulation
of data in reduces will drastically slow things down, but now I am having
second thoughts.

I've tried Jim's approach with lists and ran it against my old approach
where I was using reduce without limit (over 65k documents were used in the
test). The reduce seems to run 20 times faster! I feel like lists are
actually slowing things down, not custom reduces.

Can anyone give me some good explanation regarding this?

Just FYI, I am using CouchDB 1.2.0.


On Mon, Apr 15, 2013 at 2:52 PM, Andrey Kuprianov <
andrey.kouprianov@gmail.com> wrote:

> Btw, is reduce function that you mentioned supposed to basically output
> de-duplicate keys?
>
>
> On Mon, Apr 15, 2013 at 1:10 PM, Andrey Kuprianov <
> andrey.kouprianov@gmail.com> wrote:
>
>> Thanks. I'll try the lists. Completely forgot about them actually
>>
>>
>>
>> On Mon, Apr 15, 2013 at 12:59 PM, Jim Klo <jim.klo@sri.com> wrote:
>>
>>> Not sure if its ideal but if you need dates in epoch millis, you could
>>> round the timestamp to the floor of the current day (say midnight) in a map
>>> function, use a built in reduce... Then use a list function to filter
>>> unique countries.
>>>
>>> If you don't need a real timestamp value, use an integer like YYYYMMDD
>>> (i.e. 20130710 for 2013-Jul-10).
>>>
>>> Reduce = true will combine by day making at most (196 countries x number
>>> of days in range) to filter in the show function.
>>>
>>> - JK
>>>
>>>
>>>
>>> Sent from my iPad
>>>
>>> On Apr 14, 2013, at 6:38 PM, "Andrey Kuprianov" <
>>> andrey.kouprianov@gmail.com> wrote:
>>>
>>> > Hi guys,
>>> >
>>> > Just for the sake of a debate. Here's the question. There are
>>> transactions.
>>> > Among all other attributes there's timestamp (when transaction was
>>> made; in
>>> > seconds) and a country name  (from where the transaction was made).
>>> So, for
>>> > instance,
>>> >
>>> > {
>>> >    . . . .
>>> >    "timestamp": 1332806400
>>> >    "country_name": "Australia",
>>> >    . . . .
>>> > }
>>> >
>>> > Question is: how does one get unique / distinct country names in
>>> between
>>> > dates? For example, give me all country names in between 10-Jul-2010
>>> and
>>> > 21-Jan-2013.
>>> >
>>> > My solution was to write a custom reduce function and set
>>> > reduce_limit=false, so that i can enumerate all countries without
>>> hitting
>>> > the overflow exception. It works great! However, such solutions are
>>> frowned
>>> > upon by everyone around. Has anyone a better idea on how to tackle this
>>> > efficiently?
>>> >
>>> >    Andrey
>>>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message