incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthieu Rakotojaona <matthieu.rakotoja...@gmail.com>
Subject Re: FW: Am I doing something fundamentally wrong?
Date Thu, 24 May 2012 18:04:08 GMT
On Thu, May 24, 2012 at 8:32 AM, Mike Kimber <mkimber@kana.com> wrote:
> The "Build Profile Detail" Map referenced above takes up to 15 hours to build.
> Now once I know what I want that's not necessarily a major issue, but it is
> when I need to discover/explore the data that I need to analyse.

Ok, so what you want is a tool to retrieve information dynamically
from a store. I don't think CouchDB is your best bet on this. From
what I saw, CouchDB is much more oriented on storing and accessing
static data, which might be derived from some other static data. It's
kind of like a static site generator : you put in some content (your
blog post, your logo, ...) and it will generate (with map/reduce)
static HTML pages that you will serve directly. You can do some
post-processing on them, but it will be live. If you're going to
analyse the initial content dynamically, you'll have to regenerate the
pages every time; this is not the best way to go.
My comparison might seem far-fetched, but I hope you understand my point.

But there is something you can do with CouchDB. Basically what you
want is analyze the 'maven-build-profile' docs. What you can do in
your map is just `emit("maven-build-profile",null)`. This will give
you what you need to filter the docs.
The next step is to fiddle with the list function. You just have to
put your processing in this list function :

```
function(head, req) {
  var row;
  while(row = getRow()) {
    var doc = row.doc;

    send({"property1": doc.property1, "property2": doc.property2});

  }
}
```

This simple list function will give you properties 1 and 2 for each of
the docs that are processed. Two words of caution : you will need to
add `include_docs=true` to your query string, so that the getRow()
gets the row _with_ the doc. You will also need to do some newlines
and some commas, because send() doesn't add it : it just feeds the
output with what it has as an argument without further processing.

This kind of workflow should be flexible enough to have some
interesting results, even though it could use a lot of CPU for each
request.

Note : you never need to emit the doc._id; it is always included in
every pair you emit (it is included in the row you have at query
time). If you want to sort by id, emit `null` as a key : they will
still be sorted by ids by default (yes, CouchDB is awesome =]).

-- 
Matthieu RAKOTOJAONA

Mime
View raw message