incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Simon Metson <simonmet...@googlemail.com>
Subject Re: Best storage model for a access log database
Date Fri, 22 Oct 2010 12:56:07 GMT
Hi,
	You need to decide what you want to use with the data - how do you  
mine it to turn it into useful information? My guess is you don't need  
to see live up to the minute stats, so could write the logs to a  
database per day/week, run views against that (say counting the  
different browsers etc) and dump the results of those views into  
another database that you aggregate over a longer period (e.g. to give  
you monthly/yearly stats). Depending on your requirements you could  
drop the daily database after some time (or better yet, archive it off  
somewhere). If you keep it around you can add new views against the  
daily data at a later stage and just run the queries again, adding the  
result into the aggregation database. CouchDB's replication would make  
for a nice back up e.g. have a live instance on some HA servers that  
gets the logs, which at the end of the day you replicate to a slower  
but backed up instance that just holds the archive, or something.

If you're parsing server logs to generate your documents you might be  
better off skipping the per access document and just record daily high  
level stats. It depends on how well you know what information you want  
to access from the logs, and how likely that is to change with time.

	If you're concerned about data volume make sure you do regular house  
cleaning - compact the db, clean up views etc.
Cheers
Simon


On 22 Oct 2010, at 13:25, Fabio Batalha Cunha dos Santos wrote:

> Hello All!
>
> I'm new with couchdb, I'm doing some experiences to create a tool to  
> store
> access logs into couchdb. URL:
> http://github.com/fabiobatalha/Analytics---CouchDB
>
> I have some doubts like How is the best way to store this kind of
> information in couchdb and if is it viable?. I have created a  
> database and
> start to registry all access log of our website. In almost 14 hours  
> the
> database reach 0.5Gb with 70.000 registers, I'm estimating that in  
> 24 hours
> the database will reach 1.0Gb of stored data.
>
> According with the sample document that I'm registering and couchDB
> performance. Will I be able to create statistics logs with this  
> information
> using the couchDB / map reduce, thinking that probably in 1 year the
> database will reach something around 400Gb of stored data.
>
> This is a sample of one register:
>
> {
>   "_id": "0007f561f96d61cf7744b987895e1ef0",
>   "_rev": "1-e6b1dcdcbe0b2c6fde0dec5b4bfc41a8",
>   "instance": "scielo",
>   "date": "20101021",
>   "time": "1832",
>   "url": "http://www.scielo.br/scielo.php?pid=S0080-62342003000300002&script=sci_arttext

> ",
>   "host": "www.scielo.br",
>   "urlParams": {
>       "pid": "S0080-62342003000300002",
>       "script": "sci_arttext"
>   },
>   "referrer": "http://www.google.com.br/url?sa=t&source=web&cd=1&ved=0CBYQFjAA&url=http%3A%2F%2Fwww.scielo.br%2Fscielo.php%3Fpid%3DS0080-62342003000300002%26script%3Dsci_arttext&rct=j&q=qual%20religi%C3%A3o%20utiliza%20capim%20santo%3F&ei=XqPATLGMFYL-8AawmvXbBg&usg=AFQjCNEO8okxLNbxs2SiIXb1R5Nmr99WhA

> ",
>   "referrerParams": {
>       "sa": "t",
>       "source": "web",
>       "cd": "1",
>       "ved": "0CBYQFjAA",
>       "url": "http%3A%2F%2Fwww.scielo.br%2Fscielo.php%3Fpid 
> %3DS0080-62342003000300002%26script%3Dsci_arttext",
>       "rct": "j",
>       "q": "qual%20religi%C3%A3o%20utiliza%20capim%20santo%3F",
>       "ei": "XqPATLGMFYL-8AawmvXbBg",
>       "usg": "AFQjCNEO8okxLNbxs2SiIXb1R5Nmr99WhA"
>   },
>   "appCodeName": "Mozilla",
>   "appVersion": "5.0 (Windows; pt-BR)",
>   "language": "pt-BR",
>   "platform": "Win32",
>   "product": "Gecko",
>   "userAgent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR;
> rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11 ( .NET CLR 3.5.30729)",
>   "vendor": "",
>   "vendorSub": ""
> }
>
> Thanks in advance for any guidance, comments and suggestions.
> Fabio Batalha


Mime
View raw message