incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harikrishnan R <harikrish...@inxsasia.com>
Subject Re: How to split the data over a period of time.
Date Wed, 13 Jun 2012 13:35:26 GMT
Thanks Dave for you detailed response, I will work on your ideas.

On Tue, Jun 12, 2012 at 2:53 AM, Dave Cottlehuber <dave@muse.net.nz> wrote:

> On 11 June 2012 21:42, Harikrishnan R <harikrishnan@inxsasia.com> wrote:
> > On Mon, Jun 11, 2012 at 11:33 PM, Dave Cottlehuber <dave@muse.net.nz>
> wrote:
> >
> >> On 11 June 2012 17:42, Harikrishnan R <harikrishnan@inxsasia.com>
> wrote:
> >> > Hi Dave,
> >> >
> >> >   Many thanks for your quick response.
> >> >
> >> >   I am not updating any documents, I am keep on appending docs to
> >> database
> >> > with a specified timestamp.
> >>
> >> OK, so its a continuing series of new docs going into the DB.
> >>
> >> >   One of my requirements is *unique login counts* between two
> specified
> >> > dates.
> >> >
> >> >   my map function will emits like
> >> >
> >> >   *{[2012, 6, 4], acc_id_1}*
> >> > *   {[2012, 6, 4], acc_id_2}*
> >> > *   {[2012, 6, 4], acc_id_3}*
> >> > *   {[2012, 6, 5], acc_id_1}*
> >> > *   {[2012, 6, 6], acc_id_4}*
> >> > *   .....*
> >> > *   ....*
> >> >
> >> >   By using start key and end key I am able to get the unique counts.
> >>
> >> Are you aware you can use a reduce function and have couchdb manage
> >> that for you?
> >>
> > Yes, I know and I am using it.
> >
> >>
> >> >   Here the problem is when the dates range specified may fall into
> backup
> >> > or in-between.
> >>
> >> I don't follow. Are you using separate DBs per month, or some rotation
> >> scheme?
> >>
> >
> > sorry for the incomplete description. Here what I meant was, Assume I
> took
> > backup of old(six month) data from DB. Now DB has only last 3 months
> data.
> > On that time If one query is coming like "NEED last 8 months unique login
> > counts", How we will calculate this information, though the data is
> > scattered in backup and current database.
>
> There's no cross-DB query functionality built into CouchDB.
>
> "I have removed all the eggs from my fridge. I now need the eggs back
> in my fridge.
> What should I do?"
>
> You'll need to move the data back into the active couch:
>
> 1. Restore the db to a different name
> 2. Replicate the data back into your active couch.
> 3. trigger a view update
>
> The following approach might work for you, if you only need counts by
> month:
>
> rotate login docs into a new db each month:
>
> 2012_01.couch
> 2012_02.couch
> 2012_03.couch
> 2012_04.couch
> 2012_05.couch
>
> These are all of course nicely backed up for our lawyers and accountants
> in case the raw data is needed.
>
> Transfer the output of your reduce query for last month, into the new
> month:
>
> >> >   *{[2012, 6, 4], acc_id_1}*
> >> > *   {[2012, 6, 4], acc_id_2}*
> >> > *   {[2012, 6, 4], acc_id_3}*
> >> > *   {[2012, 6, 5], acc_id_1}*
> >> > *   {[2012, 6, 6], acc_id_4}*
>
> The above data would end up something like this so it can feed
> straight into your reduce:
>
> {[2012,6, acc_id_1], 2}
> {[2012,6, acc_id_2], 1}
> {[2012,6, acc_id_3], 1}
> {[2012,6, acc_id_4], 1}
>
> and you could group up by year then:
>
> {[2011, acc_id_1: 23}.
> etc
>
> If aggregation isn't going to help you (disk space, other query
> constraints) then
> I don't see any *fast* method that would allow you to pull in old data
> from backup,
> move it into couch, and then answer the query in realtime. You need to
> store the
> active data where CouchDB can get to it.
>
> A+
> Dave
>



-- 
-Regards
 Harikrishnan R

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message