couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fabio Batalha Cunha dos Santos <fabio.bata...@scielo.org>
Subject Re: Best storage model for a access log database
Date Fri, 22 Oct 2010 13:15:19 GMT
Hi Simon,

Thanks for your comments, I think the idea of have a database to aggregate
periods ( month ) is great.

I need to think more about the kind of information I want to extract from
those logs.

For example, one idea is, if the referrer is Google. I probably would like
know what kind of keyword the users are using to reach our website, and
more, what we are giving to the user? is it relevant or not?

Other issue is to count some URL patterns to evaluate what part of our
website have more access. For this I would make a view to count occurrences
of for example:

 "urlParams": {
     "pid": "S0080-62342003000300002",
     "script": "sci_arttext"
 },

It's a lot of things to figure out.

Thanks,
Fabio Batalha

On Fri, Oct 22, 2010 at 10:56 AM, Simon Metson
<simonmetson@googlemail.com>wrote:

> Hi,
>        You need to decide what you want to use with the data - how do you
> mine it to turn it into useful information? My guess is you don't need to
> see live up to the minute stats, so could write the logs to a database per
> day/week, run views against that (say counting the different browsers etc)
> and dump the results of those views into another database that you aggregate
> over a longer period (e.g. to give you monthly/yearly stats). Depending on
> your requirements you could drop the daily database after some time (or
> better yet, archive it off somewhere). If you keep it around you can add new
> views against the daily data at a later stage and just run the queries
> again, adding the result into the aggregation database. CouchDB's
> replication would make for a nice back up e.g. have a live instance on some
> HA servers that gets the logs, which at the end of the day you replicate to
> a slower but backed up instance that just holds the archive, or something.
>
> If you're parsing server logs to generate your documents you might be
> better off skipping the per access document and just record daily high level
> stats. It depends on how well you know what information you want to access
> from the logs, and how likely that is to change with time.
>
>        If you're concerned about data volume make sure you do regular house
> cleaning - compact the db, clean up views etc.
> Cheers
> Simon
>
>
>
> On 22 Oct 2010, at 13:25, Fabio Batalha Cunha dos Santos wrote:
>
>  Hello All!
>>
>> I'm new with couchdb, I'm doing some experiences to create a tool to store
>> access logs into couchdb. URL:
>> http://github.com/fabiobatalha/Analytics---CouchDB
>>
>> I have some doubts like How is the best way to store this kind of
>> information in couchdb and if is it viable?. I have created a database and
>> start to registry all access log of our website. In almost 14 hours the
>> database reach 0.5Gb with 70.000 registers, I'm estimating that in 24
>> hours
>> the database will reach 1.0Gb of stored data.
>>
>> According with the sample document that I'm registering and couchDB
>> performance. Will I be able to create statistics logs with this
>> information
>> using the couchDB / map reduce, thinking that probably in 1 year the
>> database will reach something around 400Gb of stored data.
>>
>> This is a sample of one register:
>>
>> {
>>  "_id": "0007f561f96d61cf7744b987895e1ef0",
>>  "_rev": "1-e6b1dcdcbe0b2c6fde0dec5b4bfc41a8",
>>  "instance": "scielo",
>>  "date": "20101021",
>>  "time": "1832",
>>  "url": "
>> http://www.scielo.br/scielo.php?pid=S0080-62342003000300002&script=sci_arttext
>> ",
>>  "host": "www.scielo.br",
>>  "urlParams": {
>>      "pid": "S0080-62342003000300002",
>>      "script": "sci_arttext"
>>  },
>>  "referrer": "
>> http://www.google.com.br/url?sa=t&source=web&cd=1&ved=0CBYQFjAA&url=http%3A%2F%2Fwww.scielo.br%2Fscielo.php%3Fpid%3DS0080-62342003000300002%26script%3Dsci_arttext&rct=j&q=qual%20religi%C3%A3o%20utiliza%20capim%20santo%3F&ei=XqPATLGMFYL-8AawmvXbBg&usg=AFQjCNEO8okxLNbxs2SiIXb1R5Nmr99WhA
>> ",
>>  "referrerParams": {
>>      "sa": "t",
>>      "source": "web",
>>      "cd": "1",
>>      "ved": "0CBYQFjAA",
>>      "url": "http%3A%2F%2Fwww.scielo.br
>> %2Fscielo.php%3Fpid%3DS0080-62342003000300002%26script%3Dsci_arttext",
>>      "rct": "j",
>>      "q": "qual%20religi%C3%A3o%20utiliza%20capim%20santo%3F",
>>      "ei": "XqPATLGMFYL-8AawmvXbBg",
>>      "usg": "AFQjCNEO8okxLNbxs2SiIXb1R5Nmr99WhA"
>>  },
>>  "appCodeName": "Mozilla",
>>  "appVersion": "5.0 (Windows; pt-BR)",
>>  "language": "pt-BR",
>>  "platform": "Win32",
>>  "product": "Gecko",
>>  "userAgent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR;
>> rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11 ( .NET CLR 3.5.30729)",
>>  "vendor": "",
>>  "vendorSub": ""
>> }
>>
>> Thanks in advance for any guidance, comments and suggestions.
>> Fabio Batalha
>>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message