incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Matheis <matheis.ste...@googlemail.com>
Subject Re: Using couchdb for analytics
Date Fri, 03 Jun 2011 07:15:23 GMT
What about proxying couch.foo.com through foo.com/couch? maybe not the
complete service, at least one "special" url which triggers the write
on couch?

Regards
Stefan

On Fri, Jun 3, 2011 at 8:56 AM, maku@makuchaku.in <maku@makuchaku.in> wrote:
> Hi everyone,
>
> I think I had a fundamental flaw in my assumption - realized this
> yesterday...
> If the couchdb analytics server is hosted on couch.foo.com (foo.com being
> the main site) - I would never be able to make write requests via client
> side javascript as cross-domain policy would be a barrier.
>
> I thought about this - and came across a potential solution...
> What if, I host an html page as an attachment in couchdb & whenever I have
> to make a write call, include this html in an iframe & pass on the
> parameters in the query string of iframe URL.
> The iframe will have javascript which understands the incoming query string
> params & takes action (creates POST/PUT to couchdb).
>
> There would be no cross-domain barriers as the html page is being served
> right out of couchdb itself - where ever its hosted (couch.foo.com)
>
> This might not be a performance hit - as etags will help in client-side
> caching of the html page.
> --
> Mayank
> http://adomado.com
>
>
>
> On Thu, Jun 2, 2011 at 8:34 PM, maku@makuchaku.in <maku@makuchaku.in> wrote:
>
>> Its 700 req/min :)
>> --
>> Mayank
>> http://adomado.com
>>
>>
>>
>> On Thu, Jun 2, 2011 at 7:10 PM, Jan Lehnardt <jan@apache.org> wrote:
>>
>>>
>>> On 2 Jun 2011, at 13:28, maku@makuchaku.in wrote:
>>>
>>> > Forgot to mention...
>>> > All of these 700 req/sec are write requests (data logging) & no data
>>> crunching.
>>> > Our current inhouse analytics solution (built on Rails, Mysql) gets
>>> >>
>>> >> about 700 req/min on an average day...
>>>
>>> min or sec? :)
>>>
>>> Cheers
>>> Jan
>>> --
>>>
>>>
>>> >>
>>> >> --
>>> >> Mayank
>>> >> http://adomado.com
>>> >>
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Jun 2, 2011 at 3:16 PM, Gabor Ratky <rgabo@rgabostyle.com>
>>> wrote:
>>> >>> Take a look at update handlers [1]. It is a more lightweight way
to
>>> create / update your visitor documents, without having to GET the document,
>>> modify and PUT back the whole thing. It also simplifies dealing with
>>> document revisions as my understanding is that you should not be running
>>> into conflicts.
>>> >>>
>>> >>> I wouldn't expect any problem handling the concurrent traffic and
>>> tracking the users, but the view indexer will take some time with the
>>> processing itself. You can always replicate the database (or parts of it
>>> using a replication filter) to another CouchDB instance and perform the
>>> crunching there.
>>> >>>
>>> >>> It's fairly vague how much updates / writes your 2k-5k traffic would
>>> cause. How many requests/sec on your site? How many property updates that
>>> causes?
>>> >>>
>>> >>> Btw, CouchDB users, is there any way to perform bulk updates using
>>> update handlers, similar to _bulk_docs?
>>> >>>
>>> >>> Gabor
>>> >>>
>>> >>> [1] http://wiki.apache.org/couchdb/Document_Update_Handlers
>>> >>>
>>> >>> On Thursday, June 2, 2011 at 11:34 AM, maku@makuchaku.in wrote:
>>> >>>
>>> >>>> Hi everyone,
>>> >>>>
>>> >>>> I came across couchdb a couple of weeks back & got really
excited by
>>> >>>> the fundamental change it brings by simply taking the app-server
out
>>> >>>> of the picture.
>>> >>>> Must say, kudos to the dev team!
>>> >>>>
>>> >>>> I am planning to write a quick analytics solution for my website
-
>>> >>>> something on the lines of Google analytics - which will measure
>>> >>>> certain properties of the visitors hitting our site.
>>> >>>>
>>> >>>> Since this is my first attempt at a JSON style document store,
I
>>> >>>> thought I'll share the architecture & see if I can make
it better (or
>>> >>>> correct my mistakes before I do them) :-)
>>> >>>>
>>> >>>> - For each unique visitor, create a document with his session_id
as
>>> the doc.id
>>> >>>> - For each property i need to track about this visitor, I create
a
>>> >>>> key-value pair in the doc created for this visitor
>>> >>>> - If visitor is a returning user, use the session_id to re-open
his
>>> >>>> doc & keep on modifying the properties
>>> >>>> - At end of each calculation time period (say 1 hour or 24 hours),
I
>>> >>>> run a cron job which fires the map-reduce jobs by requesting
the
>>> views
>>> >>>> over curl/http.
>>> >>>>
>>> >>>> A couple of questions based on above architecture...
>>> >>>> We see concurrent traffic ranging from 2k users to 5k users.
>>> >>>> - Would a couchdb instance running on a good machine (say High
CPU
>>> >>>> EC2, medium instance) work well with simultaneous writes happening...
>>> >>>> (visitors browsing, properties changing or getting created)
>>> >>>> - With a couple of million documents, would I be able to process
my
>>> >>>> views without causing any significant impact to write performance?
>>> >>>>
>>> >>>> I think my questions might be biased by the fact that I come
from a
>>> >>>> MySQL/Rails background... :-)
>>> >>>>
>>> >>>> Let me know how you guys think about this.
>>> >>>>
>>> >>>> Thanks in advance,
>>> >>>> --
>>> >>>> Mayank
>>> >>>> http://adomado.com
>>> >>>
>>> >>>
>>> >>
>>>
>>>
>>
>

Mime
View raw message