couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Cliff Williams <cliffywi...@aol.com>
Subject Re: Simple Docs in Numerous Views
Date Thu, 17 Mar 2011 16:48:57 GMT
Zdravko

have you investigated Elasticsearch http://www.elasticsearch.org/ or 
Couchdb Lucene https://github.com/rnewson/couchdb-lucene.

I personally have used Elasticsearch with the couchdb river (connector).

I think these may help with your use case

best regards

cliff

On 17/03/11 16:20, Zdravko Gligic wrote:
> Folks,
>
> I have a bit of a unique use case in which I could end up with a large
> number of very small docs (half dazen to a dozen fields or atrributes)
> with anywhere from a coupe hundred to a couple thousand (human
> readable) bytes in size.
>
> However, each one of these docs will have an id (some have natural
> short id's, like youtube's 11 char video ids but many will not) a
> tilte, maybe a thmbnail, short description and a url.  So for
> visualization purposes, let's pretend that they are typical rss news
> headlines.  These also have an author, publisher, original date
> published and date published on the site.
>
> In addition to those attributes, end users could end up classifying
> each document in one or multiple ways and there could be half dozen to
> a dozen different classification schemes - geographic (world, country,
> etc), subject (custom schemes resembling Dewey Decimal and/or Library
> of Congress, etc) as well as other sort of classifications schemes.
> However, as in these two examples, all of these schemes are at least a
> bit hierarchical in nature - but all would work in quite the same
> manner.
>
> > From the design point of view, I need to be able to present all of the
> material (: well :) in all possible ways, sorted by either date
> published (original or on web), within any of the classified
> categories.  In addition, I need to be able to keep track of who did
> what to any of the documents, including simply reading it, in addition
> to posting, classifying, etc.  For this a doc with user, docid,
> date/time and action would just about do the trick.
>
> However, all of the different ways of categorizing each doc end up
> creating a situation where the disk usage by docs themselves will end
> up being dwarfed by resources that will end up being taken by views
> indexing.  So, all of this is starting to play tricks on my mind and
> causing me to try to come up with shorter doc _id values as well as
> trying to figure out how to create views so that a document that is
> placed in one child category does not need to be put into its parent
> categories - for cases where child docs need to be shown as if they
> are in their parent categories.
>
> So, the whole thing has me scratching my hand and questioning if
> CouchDB is the right tool for what on surface appears to be quite a
> simplistic requirement.
>
> P.S. Like most of what we do, if what I am doing was to get traction,
> I could end up with 10,000's of categories and 10,000,000's of docs
> that are of half dozen to dozen different types (likely each in own
> db).  Based on a small proof of concept, with just 3 views on each
> doc, my db ends up being roughly 10 times the size of docs without the
> views.  If so, then my views:docs ratio could go as high as 10-20-50:1
> and this scares me.
>
> I would really appreciate your comments and/or suggestions.
>
> Regards,
> Zdravko
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message