incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Feinberg <feinberg.sc...@gmail.com>
Subject Re: Using couchdb for analytics
Date Mon, 12 Sep 2011 17:27:23 GMT
Unless you can ensure that only one process will be editing the document at
a time (to ensure that you never end up holding an old revision), your going
to have issues. I've never tried it, but I'd be under the assumption
conflict resolution wouldn't work at all.

Revision history is a large part of what makes CouchDB tick.  It would also
limit you from ever having a cluster, without revision history the cluster
would never be able to negotiate.

Your not going to end up with millions of revisions, as it says:  *
_revs_limit* defines a upper bound of document revisions which CouchDB keeps
track of, even afterCompaction <http://wiki.apache.org/couchdb/Compaction>.
The default is set to 1000 on CouchDB 0.11.

Not sure what it is set to now, but I assume it's probably the same.  Plus
even if it was 1 million revisions, you're talking about a million key value
pairs-nothing significant. And compaction would remove your excess
revisions.

Here's what you need: http://blog.couchbase.com/atomic-increments-couchdb

--Scott


On Mon, Sep 12, 2011 at 1:10 PM, maku@makuchaku.in <maku@makuchaku.in>wrote:

> Question
>
> If a database is configured with
> _revs_limit<
> http://wiki.apache.org/couchdb/HTTP_database_API#Accessing_Database-specific_options
> >=1,
> will the following features still work?
> - Conflict resolution
> - Changes feed
>
> Hypothetically, to maintain an incrementing counter, we can have such a
> key/value pair in the document whose database is configured with
> _revs_limit
> = 1
>
> Thoughts?
>
> Thanks!
> --
> Mayank
> http://adomado.com
>
>
>
> On Mon, Sep 12, 2011 at 6:40 PM, maku@makuchaku.in <maku@makuchaku.in
> >wrote:
>
> > Thanks for the tip Scott.
> >
> > However, I have a feeling that compacting the database is not the correct
> > answer to this problem.
> >
> > I am going to test - limiting revs on a document.
> >
> > Lets see how that fares up...
> > But I have a hunch that if I do that, the conflict resolution strategy
> will
> > not work.
> > --
> > Mayank
> > http://adomado.com
> >
> >
> >
> > On Mon, Sep 12, 2011 at 6:14 PM, Scott Feinberg <
> feinberg.scott@gmail.com>wrote:
> >
> >> It wouldn't consume too much space as long as your regularly compacting
> >> your
> >> database.
> >>
> >> As much as I love CouchDB and this is a CouchDB users mailing list, I
> >> tried
> >> to do something similar and I found MongoDB was better suited due to
> it's
> >> support for partial updates.
> >>
> >> I based the project of some of the work from
> http://hummingbirdstats.com/
> >> .
> >>
> >> --Scott
> >>
> >> On Mon, Sep 12, 2011 at 8:34 AM, maku@makuchaku.in <maku@makuchaku.in
> >> >wrote:
> >>
> >> > Hi everyone,
> >> >
> >> > Considering that I've bypassed the problem of cross-domain
> communication
> >> > using proxy/iframes...
> >> >
> >> > I want to store counters in a document, incremented on each page view.
> >> > CouchDB will create a complete revision of this document for just 1
> >> counter
> >> > update.
> >> >
> >> > Wouldn't this consume too much space?
> >> > Considering that I have 1M hits in a day, I might be looking at 1M
> >> > revisions
> >> > to the document in a day.
> >> >
> >> > Any thoughts on this...
> >> >
> >> > Thanks!
> >> > --
> >> > Mayank
> >> > http://adomado.com
> >> >
> >> >
> >> >
> >> > On Fri, Jun 3, 2011 at 12:45 PM, Stefan Matheis <
> >> > matheis.stefan@googlemail.com> wrote:
> >> >
> >> > > What about proxying couch.foo.com through foo.com/couch? maybe not
> >> the
> >> > > complete service, at least one "special" url which triggers the
> write
> >> > > on couch?
> >> > >
> >> > > Regards
> >> > > Stefan
> >> > >
> >> > > On Fri, Jun 3, 2011 at 8:56 AM, maku@makuchaku.in <
> maku@makuchaku.in>
> >> > > wrote:
> >> > > > Hi everyone,
> >> > > >
> >> > > > I think I had a fundamental flaw in my assumption - realized
this
> >> > > > yesterday...
> >> > > > If the couchdb analytics server is hosted on couch.foo.com
> >> (foo.combeing
> >> > > > the main site) - I would never be able to make write requests
via
> >> > client
> >> > > > side javascript as cross-domain policy would be a barrier.
> >> > > >
> >> > > > I thought about this - and came across a potential solution...
> >> > > > What if, I host an html page as an attachment in couchdb &
> whenever
> >> I
> >> > > have
> >> > > > to make a write call, include this html in an iframe & pass
on the
> >> > > > parameters in the query string of iframe URL.
> >> > > > The iframe will have javascript which understands the incoming
> query
> >> > > string
> >> > > > params & takes action (creates POST/PUT to couchdb).
> >> > > >
> >> > > > There would be no cross-domain barriers as the html page is being
> >> > served
> >> > > > right out of couchdb itself - where ever its hosted (
> couch.foo.com)
> >> > > >
> >> > > > This might not be a performance hit - as etags will help in
> >> client-side
> >> > > > caching of the html page.
> >> > > > --
> >> > > > Mayank
> >> > > > http://adomado.com
> >> > > >
> >> > > >
> >> > > >
> >> > > > On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in <
> >> maku@makuchaku.in>
> >> > > wrote:
> >> > > >
> >> > > >> Its 700 req/min :)
> >> > > >> --
> >> > > >> Mayank
> >> > > >> http://adomado.com
> >> > > >>
> >> > > >>
> >> > > >>
> >> > > >> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt <jan@apache.org>
> >> wrote:
> >> > > >>
> >> > > >>>
> >> > > >>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:
> >> > > >>>
> >> > > >>> > Forgot to mention...
> >> > > >>> > All of these 700 req/sec are write requests (data
logging) &
> no
> >> > data
> >> > > >>> crunching.
> >> > > >>> > Our current inhouse analytics solution (built on
Rails, Mysql)
> >> gets
> >> > > >>> >>
> >> > > >>> >> about 700 req/min on an average day...
> >> > > >>>
> >> > > >>> min or sec? :)
> >> > > >>>
> >> > > >>> Cheers
> >> > > >>> Jan
> >> > > >>> --
> >> > > >>>
> >> > > >>>
> >> > > >>> >>
> >> > > >>> >> --
> >> > > >>> >> Mayank
> >> > > >>> >> http://adomado.com
> >> > > >>> >>
> >> > > >>> >>
> >> > > >>> >>
> >> > > >>> >>
> >> > > >>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky
<
> >> rgabo@rgabostyle.com
> >> > >
> >> > > >>> wrote:
> >> > > >>> >>> Take a look at update handlers [1]. It is
a more lightweight
> >> way
> >> > to
> >> > > >>> create / update your visitor documents, without having
to GET
> the
> >> > > document,
> >> > > >>> modify and PUT back the whole thing. It also simplifies
dealing
> >> with
> >> > > >>> document revisions as my understanding is that you should
not be
> >> > > running
> >> > > >>> into conflicts.
> >> > > >>> >>>
> >> > > >>> >>> I wouldn't expect any problem handling the
concurrent
> traffic
> >> and
> >> > > >>> tracking the users, but the view indexer will take some
time
> with
> >> the
> >> > > >>> processing itself. You can always replicate the database
(or
> parts
> >> of
> >> > > it
> >> > > >>> using a replication filter) to another CouchDB instance
and
> >> perform
> >> > the
> >> > > >>> crunching there.
> >> > > >>> >>>
> >> > > >>> >>> It's fairly vague how much updates / writes
your 2k-5k
> traffic
> >> > > would
> >> > > >>> cause. How many requests/sec on your site? How many property
> >> updates
> >> > > that
> >> > > >>> causes?
> >> > > >>> >>>
> >> > > >>> >>> Btw, CouchDB users, is there any way to
perform bulk updates
> >> > using
> >> > > >>> update handlers, similar to _bulk_docs?
> >> > > >>> >>>
> >> > > >>> >>> Gabor
> >> > > >>> >>>
> >> > > >>> >>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers
> >> > > >>> >>>
> >> > > >>> >>> On Thursday, June 2, 2011 at 11:34 AM,
> maku@makuchaku.inwrote:
> >> > > >>> >>>
> >> > > >>> >>>> Hi everyone,
> >> > > >>> >>>>
> >> > > >>> >>>> I came across couchdb a couple of weeks
back & got really
> >> > excited
> >> > > by
> >> > > >>> >>>> the fundamental change it brings by
simply taking the
> >> app-server
> >> > > out
> >> > > >>> >>>> of the picture.
> >> > > >>> >>>> Must say, kudos to the dev team!
> >> > > >>> >>>>
> >> > > >>> >>>> I am planning to write a quick analytics
solution for my
> >> website
> >> > -
> >> > > >>> >>>> something on the lines of Google analytics
- which will
> >> measure
> >> > > >>> >>>> certain properties of the visitors hitting
our site.
> >> > > >>> >>>>
> >> > > >>> >>>> Since this is my first attempt at a
JSON style document
> >> store, I
> >> > > >>> >>>> thought I'll share the architecture
& see if I can make it
> >> > better
> >> > > (or
> >> > > >>> >>>> correct my mistakes before I do them)
:-)
> >> > > >>> >>>>
> >> > > >>> >>>> - For each unique visitor, create a
document with his
> >> session_id
> >> > > as
> >> > > >>> the doc.id
> >> > > >>> >>>> - For each property i need to track
about this visitor, I
> >> create
> >> > a
> >> > > >>> >>>> key-value pair in the doc created for
this visitor
> >> > > >>> >>>> - If visitor is a returning user, use
the session_id to
> >> re-open
> >> > > his
> >> > > >>> >>>> doc & keep on modifying the properties
> >> > > >>> >>>> - At end of each calculation time period
(say 1 hour or 24
> >> > hours),
> >> > > I
> >> > > >>> >>>> run a cron job which fires the map-reduce
jobs by
> requesting
> >> the
> >> > > >>> views
> >> > > >>> >>>> over curl/http.
> >> > > >>> >>>>
> >> > > >>> >>>> A couple of questions based on above
architecture...
> >> > > >>> >>>> We see concurrent traffic ranging from
2k users to 5k
> users.
> >> > > >>> >>>> - Would a couchdb instance running on
a good machine (say
> >> High
> >> > CPU
> >> > > >>> >>>> EC2, medium instance) work well with
simultaneous writes
> >> > > happening...
> >> > > >>> >>>> (visitors browsing, properties changing
or getting created)
> >> > > >>> >>>> - With a couple of million documents,
would I be able to
> >> process
> >> > > my
> >> > > >>> >>>> views without causing any significant
impact to write
> >> > performance?
> >> > > >>> >>>>
> >> > > >>> >>>> I think my questions might be biased
by the fact that I
> come
> >> > from
> >> > > a
> >> > > >>> >>>> MySQL/Rails background... :-)
> >> > > >>> >>>>
> >> > > >>> >>>> Let me know how you guys think about
this.
> >> > > >>> >>>>
> >> > > >>> >>>> Thanks in advance,
> >> > > >>> >>>> --
> >> > > >>> >>>> Mayank
> >> > > >>> >>>> http://adomado.com
> >> > > >>> >>>
> >> > > >>> >>>
> >> > > >>> >>
> >> > > >>>
> >> > > >>>
> >> > > >>
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message