couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nils Breunese <>
Subject RE: Simple Docs in Numerous Views
Date Fri, 18 Mar 2011 07:56:16 GMT
How big your indexes become depends largely on what your storing as the value in your emit.
What are you storing? Usually it is fine to store 'null' as the value and just use ?include_docs=true
when querying the view to retrieve the mapped document. At least this will keep disk space
usage to a minimum.

Van: [] namens Zdravko Gligic []
Verzonden: donderdag 17 maart 2011 17:20
Onderwerp: Simple Docs in Numerous Views


I have a bit of a unique use case in which I could end up with a large
number of very small docs (half dazen to a dozen fields or atrributes)
with anywhere from a coupe hundred to a couple thousand (human
readable) bytes in size.

However, each one of these docs will have an id (some have natural
short id's, like youtube's 11 char video ids but many will not) a
tilte, maybe a thmbnail, short description and a url.  So for
visualization purposes, let's pretend that they are typical rss news
headlines.  These also have an author, publisher, original date
published and date published on the site.

In addition to those attributes, end users could end up classifying
each document in one or multiple ways and there could be half dozen to
a dozen different classification schemes - geographic (world, country,
etc), subject (custom schemes resembling Dewey Decimal and/or Library
of Congress, etc) as well as other sort of classifications schemes.
However, as in these two examples, all of these schemes are at least a
bit hierarchical in nature - but all would work in quite the same

>From the design point of view, I need to be able to present all of the
material (: well :) in all possible ways, sorted by either date
published (original or on web), within any of the classified
categories.  In addition, I need to be able to keep track of who did
what to any of the documents, including simply reading it, in addition
to posting, classifying, etc.  For this a doc with user, docid,
date/time and action would just about do the trick.

However, all of the different ways of categorizing each doc end up
creating a situation where the disk usage by docs themselves will end
up being dwarfed by resources that will end up being taken by views
indexing.  So, all of this is starting to play tricks on my mind and
causing me to try to come up with shorter doc _id values as well as
trying to figure out how to create views so that a document that is
placed in one child category does not need to be put into its parent
categories - for cases where child docs need to be shown as if they
are in their parent categories.

So, the whole thing has me scratching my hand and questioning if
CouchDB is the right tool for what on surface appears to be quite a
simplistic requirement.

P.S. Like most of what we do, if what I am doing was to get traction,
I could end up with 10,000's of categories and 10,000,000's of docs
that are of half dozen to dozen different types (likely each in own
db).  Based on a small proof of concept, with just 3 views on each
doc, my db ends up being roughly 10 times the size of docs without the
views.  If so, then my views:docs ratio could go as high as 10-20-50:1
and this scares me.

I would really appreciate your comments and/or suggestions.


View raw message