incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "maku@makuchaku.in" <m...@makuchaku.in>
Subject Re: Using couchdb for analytics
Date Fri, 03 Jun 2011 06:56:14 GMT
Hi everyone,

I think I had a fundamental flaw in my assumption - realized this
yesterday...
If the couchdb analytics server is hosted on couch.foo.com (foo.com being
the main site) - I would never be able to make write requests via client
side javascript as cross-domain policy would be a barrier.

I thought about this - and came across a potential solution...
What if, I host an html page as an attachment in couchdb & whenever I have
to make a write call, include this html in an iframe & pass on the
parameters in the query string of iframe URL.
The iframe will have javascript which understands the incoming query string
params & takes action (creates POST/PUT to couchdb).

There would be no cross-domain barriers as the html page is being served
right out of couchdb itself - where ever its hosted (couch.foo.com)

This might not be a performance hit - as etags will help in client-side
caching of the html page.
--
Mayank
http://adomado.com



On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in <maku@makuchaku.in> wrote:

> Its 700 req/min :)
> --
> Mayank
> http://adomado.com
>
>
>
> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt <jan@apache.org> wrote:
>
>>
>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:
>>
>> > Forgot to mention...
>> > All of these 700 req/sec are write requests (data logging) & no data
>> crunching.
>> > Our current inhouse analytics solution (built on Rails, Mysql) gets
>> >>
>> >> about 700 req/min on an average day...
>>
>> min or sec? :)
>>
>> Cheers
>> Jan
>> --
>>
>>
>> >>
>> >> --
>> >> Mayank
>> >> http://adomado.com
>> >>
>> >>
>> >>
>> >>
>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky <rgabo@rgabostyle.com>
>> wrote:
>> >>> Take a look at update handlers [1]. It is a more lightweight way to
>> create / update your visitor documents, without having to GET the document,
>> modify and PUT back the whole thing. It also simplifies dealing with
>> document revisions as my understanding is that you should not be running
>> into conflicts.
>> >>>
>> >>> I wouldn't expect any problem handling the concurrent traffic and
>> tracking the users, but the view indexer will take some time with the
>> processing itself. You can always replicate the database (or parts of it
>> using a replication filter) to another CouchDB instance and perform the
>> crunching there.
>> >>>
>> >>> It's fairly vague how much updates / writes your 2k-5k traffic would
>> cause. How many requests/sec on your site? How many property updates that
>> causes?
>> >>>
>> >>> Btw, CouchDB users, is there any way to perform bulk updates using
>> update handlers, similar to _bulk_docs?
>> >>>
>> >>> Gabor
>> >>>
>> >>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers
>> >>>
>> >>> On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.in wrote:
>> >>>
>> >>>> Hi everyone,
>> >>>>
>> >>>> I came across couchdb a couple of weeks back & got really excited
by
>> >>>> the fundamental change it brings by simply taking the app-server
out
>> >>>> of the picture.
>> >>>> Must say, kudos to the dev team!
>> >>>>
>> >>>> I am planning to write a quick analytics solution for my website
-
>> >>>> something on the lines of Google analytics - which will measure
>> >>>> certain properties of the visitors hitting our site.
>> >>>>
>> >>>> Since this is my first attempt at a JSON style document store, I
>> >>>> thought I'll share the architecture & see if I can make it better
(or
>> >>>> correct my mistakes before I do them) :-)
>> >>>>
>> >>>> - For each unique visitor, create a document with his session_id
as
>> the doc.id
>> >>>> - For each property i need to track about this visitor, I create
a
>> >>>> key-value pair in the doc created for this visitor
>> >>>> - If visitor is a returning user, use the session_id to re-open
his
>> >>>> doc & keep on modifying the properties
>> >>>> - At end of each calculation time period (say 1 hour or 24 hours),
I
>> >>>> run a cron job which fires the map-reduce jobs by requesting the
>> views
>> >>>> over curl/http.
>> >>>>
>> >>>> A couple of questions based on above architecture...
>> >>>> We see concurrent traffic ranging from 2k users to 5k users.
>> >>>> - Would a couchdb instance running on a good machine (say High CPU
>> >>>> EC2, medium instance) work well with simultaneous writes happening...
>> >>>> (visitors browsing, properties changing or getting created)
>> >>>> - With a couple of million documents, would I be able to process
my
>> >>>> views without causing any significant impact to write performance?
>> >>>>
>> >>>> I think my questions might be biased by the fact that I come from
a
>> >>>> MySQL/Rails background... :-)
>> >>>>
>> >>>> Let me know how you guys think about this.
>> >>>>
>> >>>> Thanks in advance,
>> >>>> --
>> >>>> Mayank
>> >>>> http://adomado.com
>> >>>
>> >>>
>> >>
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message