couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Seth Falcon <>
Subject using couchdb for access log analysis
Date Wed, 17 Jun 2009 17:34:40 GMT
Hi all,

I've been exploring using couchdb to store aggregated access log
data.  I would really appreciate a bit of feedback on the approach
I've taken.

First, the problem I'm trying solve.  The input data consists of web
server access logs.  I want to generate a report of how many times
each page was viewed over different time windows.  For example, I'd
like to be able to request the report for the day 2009-06-16 as well
as a particular hour in a given day.

After reading the wiki page about stats aggregation, I started by
pre-reducing into one minute chunks.  Each one minutes worth of log
data results in a list of pages and view counts over that minute.  For
each page/count pair I insert a document like the following into

        "_id" : "2009-06-16T13h30_abcdefg",
        "url" : "/foo/bar/baz",
        "view_count" : 13

The ID is the timestamp representing the minute chunk prepended to the
md5 digest of the url.

One idea for a view that will allow querying different time units is
to emit keys like the following for the above example doc:

    ["hour",  "2009-06-16T13", "abcdefg"]
    ["day",   "2009-06-16",    "abcdefg"]
    ["month", "2009-06",       "abcdefg"]

Then one can query with startkey=["hour", "2009-06-16T13", true] and
endkey=["hour", "2009-06-16T13", {}] to get a particular hour using
group=true and a reduce function that sums the view_count.

Does this seem like a reasonable approach?  Would it be better to
create separate views for hour, day, and month and avoid the
array-valued keys?  In a previous post, I goofed up in understanding
how startkey/endkey queries work.  Am I making a similar error in
thinking with the above array-valued key approach?  I'm thinking this
is a different case because the time unit is always an exact match in
startkey and endkey.

Anyhow, I'd really appreciate any suggestions for improvement or
confirmation that this should kinda-sorta work.


+ seth

View raw message