Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 47153 invoked from network); 22 Oct 2010 12:56:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 22 Oct 2010 12:56:42 -0000 Received: (qmail 85017 invoked by uid 500); 22 Oct 2010 12:56:40 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 84925 invoked by uid 500); 22 Oct 2010 12:56:39 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 84916 invoked by uid 99); 22 Oct 2010 12:56:38 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Oct 2010 12:56:38 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of simonmetson@googlemail.com designates 74.125.82.180 as permitted sender) Received: from [74.125.82.180] (HELO mail-wy0-f180.google.com) (74.125.82.180) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 22 Oct 2010 12:56:33 +0000 Received: by wyb32 with SMTP id 32so915644wyb.11 for ; Fri, 22 Oct 2010 05:56:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:received:received:message-id:from:to :in-reply-to:content-type:content-transfer-encoding:mime-version :subject:date:references:x-mailer; bh=95MGkfwc//ivlBQmMYcKwjGVZmFv0Lon39S/VImM6iU=; b=Z+pZOH0eD1M69qQcwenvHGNgek1p/PVUqucC7i6nM3iL3GY3HWr1v/IOBwhAE2WHiZ 8GtxEuRTn4b/XsoRJljToHcoXMHiZ+4UVHfkJo7h5O+GTXkpq1vYK3a0oFr6IJpExLuN NuJgFGoinf1f1eDC9QEB2qsrbKb7olW0nwi/M= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=message-id:from:to:in-reply-to:content-type :content-transfer-encoding:mime-version:subject:date:references :x-mailer; b=FU59K6+1TLNWgYNle9CyEqYcmcBYdhX7/eZ5879/bBo7L8J/ss/82NoMrJEN3nP73e lfZqnZhtVG0pDemdont2Qsrvu0Xdvw/K6PwViJVvOJ/EBfN8UftDY4bc3mi4ZsCtvOxP MzjuUcmpKIXCNmGew9lBdMY5dJLAxHBLbnifY= Received: by 10.227.144.20 with SMTP id x20mr2690498wbu.134.1287752170176; Fri, 22 Oct 2010 05:56:10 -0700 (PDT) Received: from plague.phy.bris.ac.uk (plague.phy.bris.ac.uk [137.222.58.117]) by mx.google.com with ESMTPS id b30sm2454023wbb.10.2010.10.22.05.56.08 (version=SSLv3 cipher=RC4-MD5); Fri, 22 Oct 2010 05:56:09 -0700 (PDT) Message-Id: From: Simon Metson To: user@couchdb.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Subject: Re: Best storage model for a access log database Date: Fri, 22 Oct 2010 13:56:07 +0100 References: X-Mailer: Apple Mail (2.936) Hi, You need to decide what you want to use with the data - how do you mine it to turn it into useful information? My guess is you don't need to see live up to the minute stats, so could write the logs to a database per day/week, run views against that (say counting the different browsers etc) and dump the results of those views into another database that you aggregate over a longer period (e.g. to give you monthly/yearly stats). Depending on your requirements you could drop the daily database after some time (or better yet, archive it off somewhere). If you keep it around you can add new views against the daily data at a later stage and just run the queries again, adding the result into the aggregation database. CouchDB's replication would make for a nice back up e.g. have a live instance on some HA servers that gets the logs, which at the end of the day you replicate to a slower but backed up instance that just holds the archive, or something. If you're parsing server logs to generate your documents you might be better off skipping the per access document and just record daily high level stats. It depends on how well you know what information you want to access from the logs, and how likely that is to change with time. If you're concerned about data volume make sure you do regular house cleaning - compact the db, clean up views etc. Cheers Simon On 22 Oct 2010, at 13:25, Fabio Batalha Cunha dos Santos wrote: > Hello All! > > I'm new with couchdb, I'm doing some experiences to create a tool to > store > access logs into couchdb. URL: > http://github.com/fabiobatalha/Analytics---CouchDB > > I have some doubts like How is the best way to store this kind of > information in couchdb and if is it viable?. I have created a > database and > start to registry all access log of our website. In almost 14 hours > the > database reach 0.5Gb with 70.000 registers, I'm estimating that in > 24 hours > the database will reach 1.0Gb of stored data. > > According with the sample document that I'm registering and couchDB > performance. Will I be able to create statistics logs with this > information > using the couchDB / map reduce, thinking that probably in 1 year the > database will reach something around 400Gb of stored data. > > This is a sample of one register: > > { > "_id": "0007f561f96d61cf7744b987895e1ef0", > "_rev": "1-e6b1dcdcbe0b2c6fde0dec5b4bfc41a8", > "instance": "scielo", > "date": "20101021", > "time": "1832", > "url": "http://www.scielo.br/scielo.php?pid=S0080-62342003000300002&script=sci_arttext > ", > "host": "www.scielo.br", > "urlParams": { > "pid": "S0080-62342003000300002", > "script": "sci_arttext" > }, > "referrer": "http://www.google.com.br/url?sa=t&source=web&cd=1&ved=0CBYQFjAA&url=http%3A%2F%2Fwww.scielo.br%2Fscielo.php%3Fpid%3DS0080-62342003000300002%26script%3Dsci_arttext&rct=j&q=qual%20religi%C3%A3o%20utiliza%20capim%20santo%3F&ei=XqPATLGMFYL-8AawmvXbBg&usg=AFQjCNEO8okxLNbxs2SiIXb1R5Nmr99WhA > ", > "referrerParams": { > "sa": "t", > "source": "web", > "cd": "1", > "ved": "0CBYQFjAA", > "url": "http%3A%2F%2Fwww.scielo.br%2Fscielo.php%3Fpid > %3DS0080-62342003000300002%26script%3Dsci_arttext", > "rct": "j", > "q": "qual%20religi%C3%A3o%20utiliza%20capim%20santo%3F", > "ei": "XqPATLGMFYL-8AawmvXbBg", > "usg": "AFQjCNEO8okxLNbxs2SiIXb1R5Nmr99WhA" > }, > "appCodeName": "Mozilla", > "appVersion": "5.0 (Windows; pt-BR)", > "language": "pt-BR", > "platform": "Win32", > "product": "Gecko", > "userAgent": "Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR; > rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11 ( .NET CLR 3.5.30729)", > "vendor": "", > "vendorSub": "" > } > > Thanks in advance for any guidance, comments and suggestions. > Fabio Batalha