couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <...@apache.org>
Subject Re: The Blog
Date Tue, 10 Feb 2009 09:38:34 GMT
Hi,

Mister D, you made a convincing argument that you made your
homework and you haven't and several people told you now (without
backing it up because they assumed you'd understand), please
don't get cranky.

Everybody, this thread is heading towards personal and leaving the
technical. Let's not do that, ok? My patience growth thinner too,
but lets show some code.

--

Mister D. You lead this under the assumption* that a lot of people
are not getting CouchDB and that we should work on our promotion
and documentation. You have a point, but you're also just seeing
only one side. There's a large group of people who do get CouchDB
for what it is, just because some people blog about their experiences
(the blog) which just uses a limited feature set of CouchDB doesn't
mean they don't get it. (maybe they even don't, but that's okay,
their blog doesn't need to highly scalable, they are just tired of  
writing
SQL, fighting an ORM or being lost in driver-dependency hell).

*you never said so explicitly, but your argument leave me to believe
that. If that's not the case, can you clarify.


>> Yes, you can. Just not in the way you're used to thinking. A
>> Map/Reduce view is a fixed *mapping* from documents to a sorted
>> key/value space.
>
> Yes, you just said it. *Fixed*. If you have 200 documents, 100 from
> Jan to Nov, and 100 from Nov to Dec, there is no way you can fill them
> into two buckets ("Jan-Nov" and "Nov-Dec"). It would require variable
> conditions.

docs:

{"_id":"foo","date":[2008, 01, 01],"data":"abc"}
{"_id":"bar","date":[2008, 02, 01],"data":"def"}
{"_id":"baz","date":[2008, 03, 01],"data":"abc"}
{"_id":"qux","date":[2008, 04, 01],"data":"ghi"}
{"_id":"quux","date":[2008, 05, 01],"data":"jkl"}
{"_id":"corge","date":[2008, 06, 01],"data":"mno"}
{"_id":"grault","date":[2008, 07, 01],"data":"pqr"}
{"_id":"garply","date":[2008, 08, 01],"data":"stu"}
{"_id":"waldo","date":[2008, 09, 01],"data":"vwx"}
{"_id":"fred","date":[2008, 10, 01],"data":"yza}
{"_id":"plugh","date":[2008, 11, 01],"data":"bcd"}
{"_id":"xyzzy","date":[2008, 12, 01],"data":"efg"}

map function:

function(doc) {
   emit(doc.date, doc.data);
}

result:

{"key":[2008, 01, 01],"value":"abc","id":foo}
{"key":[2008, 02, 01],"value":"def","id":bar}
{"key":[2008, 03, 01],"value":"abc","id":baz}
{"key":[2008, 04, 01],"value":"ghi","id":qux}
{"key":[2008, 05, 01],"value":"jkl","id":quux}
{"key":[2008, 06, 01],"value":"mno","id":corge}
{"key":[2008, 07, 01],"value":"pqr","id":grault}
{"key":[2008, 08, 01],"value":"stu","id":garply}
{"key":[2008, 09, 01],"value":"vwx","id":waldo}
{"key":[2008, 10, 01],"value":"yza","id":fred}
{"key":[2008, 11, 01],"value":"bcd","id":plugh}
{"key":[2008, 12, 01],"value":"efg","id":xyzzy}

Not sure what you exactly propose, but streaming
the view result to the client and cutting at a date
where the month marker bumps to 10 seems pretty
reasonable to get this into two buckets.

If you permit two requests:

?endkey=[2008,10] // everything up to october
?startkey=[2008,11] // everything since
?startkey=[2008,11]&endkey=[2008, 12] // everything since if you  
permit more docs and 2009 data.

Index operations are your "variable conditions".


>> Also, you may count things.
>
> I never said you couldn't. I said you cannot count like += and you
> cannot aggregate counts to get rid of all the documents. Let's say you
> want to count pageviews. Easy, insert a document for every pageview,
> create a "sum-view". But, this will lead to way too many documents?
> Doesn't seem feasible. Of course, CouchDB isn't the tool for that job,
> but I would still like to see some really hands on examples of what
> CouchDB can do. I think we covered the concepts now.

Run a cronjob that does the roll-up for you periodically and use single
docs in the meantime that can be later deleted. Much like a log-file
analyzer + logrotation. Not saying it is the best idea, I'd use  
something
else for that (or would I, watch dev@ for news), plus, this is about  
your
data, so you'd not just count arbitrary things but you'd have a bunch of
documents representing your records and you'd be able to sum them
up just nice.


>> Patrick was trying to help and was correct.
>
> No, he is not.

We need to get into a concrete example before we can solve this
one. I assume there's just a mismatch of assumptions. We could
also let it rest.


Cheers
Jan
--


Mime
View raw message