couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dave Cottlehuber <d...@muse.net.nz>
Subject Re: How to split the data over a period of time.
Date Mon, 11 Jun 2012 21:23:32 GMT
On 11 June 2012 21:42, Harikrishnan R <harikrishnan@inxsasia.com> wrote:
> On Mon, Jun 11, 2012 at 11:33 PM, Dave Cottlehuber <dave@muse.net.nz> wrote:
>
>> On 11 June 2012 17:42, Harikrishnan R <harikrishnan@inxsasia.com> wrote:
>> > Hi Dave,
>> >
>> >   Many thanks for your quick response.
>> >
>> >   I am not updating any documents, I am keep on appending docs to
>> database
>> > with a specified timestamp.
>>
>> OK, so its a continuing series of new docs going into the DB.
>>
>> >   One of my requirements is *unique login counts* between two specified
>> > dates.
>> >
>> >   my map function will emits like
>> >
>> >   *{[2012, 6, 4], acc_id_1}*
>> > *   {[2012, 6, 4], acc_id_2}*
>> > *   {[2012, 6, 4], acc_id_3}*
>> > *   {[2012, 6, 5], acc_id_1}*
>> > *   {[2012, 6, 6], acc_id_4}*
>> > *   .....*
>> > *   ....*
>> >
>> >   By using start key and end key I am able to get the unique counts.
>>
>> Are you aware you can use a reduce function and have couchdb manage
>> that for you?
>>
> Yes, I know and I am using it.
>
>>
>> >   Here the problem is when the dates range specified may fall into backup
>> > or in-between.
>>
>> I don't follow. Are you using separate DBs per month, or some rotation
>> scheme?
>>
>
> sorry for the incomplete description. Here what I meant was, Assume I took
> backup of old(six month) data from DB. Now DB has only last 3 months data.
> On that time If one query is coming like "NEED last 8 months unique login
> counts", How we will calculate this information, though the data is
> scattered in backup and current database.

There's no cross-DB query functionality built into CouchDB.

"I have removed all the eggs from my fridge. I now need the eggs back
in my fridge.
What should I do?"

You'll need to move the data back into the active couch:

1. Restore the db to a different name
2. Replicate the data back into your active couch.
3. trigger a view update

The following approach might work for you, if you only need counts by month:

rotate login docs into a new db each month:

2012_01.couch
2012_02.couch
2012_03.couch
2012_04.couch
2012_05.couch

These are all of course nicely backed up for our lawyers and accountants
in case the raw data is needed.

Transfer the output of your reduce query for last month, into the new month:

>> >   *{[2012, 6, 4], acc_id_1}*
>> > *   {[2012, 6, 4], acc_id_2}*
>> > *   {[2012, 6, 4], acc_id_3}*
>> > *   {[2012, 6, 5], acc_id_1}*
>> > *   {[2012, 6, 6], acc_id_4}*

The above data would end up something like this so it can feed
straight into your reduce:

{[2012,6, acc_id_1], 2}
{[2012,6, acc_id_2], 1}
{[2012,6, acc_id_3], 1}
{[2012,6, acc_id_4], 1}

and you could group up by year then:

{[2011, acc_id_1: 23}.
etc

If aggregation isn't going to help you (disk space, other query
constraints) then
I don't see any *fast* method that would allow you to pull in old data
from backup,
move it into couch, and then answer the query in realtime. You need to
store the
active data where CouchDB can get to it.

A+
Dave

Mime
View raw message