couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: Feature Requests/Discussion
Date Mon, 13 Feb 2012 08:39:33 GMT
Skimmed not grokked. 



On Feb 12, 2012, at 5:45 PM, James Hayton <jamesbhayton@gmail.com> wrote:

> Hi Everyone-
> 
> I have been using CouchDB for several years and I absolutely love working
> with it most of the time.  Thanks to everyone who has made it such a joy to
> work with.  There are however a few consistent situations where I run into
> trouble and that I would like to fix if possible.  I have a few ideas
> regarding features that would help me design my data model the way I want
> and would require me to make far less trade offs at the application level.
> 
> While I think I grok the public facing api fairly well, I don't know the
> internals at all so I don't know if these features are possible or not.
> What I would like is for some feedback regarding the possibility of each
> as well as some sort of feedback regarding difficult, why that feature
> hasn't been implemented yet, etc...  I know some have been discussed
> before, but haven't been implemented yet so I just want to figure out why
> and what I can do about it.  My CouchDB use is mostly personal thus far,
> but in the spirit of contributing back (since I have no erlang skills), I
> would be willing to sponsor somebody to get these features coded up and
> committed to core if everyone agreed that they were valuable features that
> should be included in CouchDB.
> 
> Anyway, on to my features:
> 
> *1) Multiple Start & End Keys For CouchDB Views With Group Level Option For
> Reduce Views*
> 

Historically the issue with this feature is the confusion with an "intersect view results"
behavior. Beyond that the only real issue is getting some details on query strings vs JSON
handling for parameters. 

> This one has been discussed multiple times before.  The JIRA issue is 523 I
> think.
> 
> I don't think there is really much debate that this is a must have for
> CouchDB.   There has even been a patch or two.  What is stopping this from
> happening?  There hasn't been much discussion on the topic lately.  A
> status update would be great from anyone who has the power to make this
> happen/get the ball rolling again.  This single feature alone would make so
> many more things possible.
> 
> *2) Return "No Key" And Empty Row When POSTing Keys To Views Instead Of
> Nothing If No Key Matches View*
> 

Assuming this is as simple as the one line description, it should be cake. Might need to be
optional for backwards compatibility. There's a lot of text below here so I might be missing
something. 

> The common scenario I have is where I can't get everything in one query
> from CouchDB, where I am view my data in a list format.  Say for example,
> the category page of a store.  I want to display a list or products and
> each product has the following related documents that need to be called per
> view row:  The products brand doc, the products pricing doc, the products
> currently availablilty (reduce view row ), any customer specific product
> documents such as the customer part number, customer specific pricing,
> etc...
> 
> So my product doc, looks like this:
> 
> type: Product
> brand: id_for_brand_doc_here
> prices: id_for_prices_doc_here
> attributes: { hash of attributes here }
> categories: [array_of_category_ids_here]
> 
> So, at most I can get the product and one other doc per view row using the
> linked document feature.  This means that if I want to display all the
> information I want in my application, I have to do multiple lookups per
> product in the list view.  This could easily generate 100's of queries to
> couch for 1 page view.  Multiply this by several requests coming in at the
> same time at it starts to become a problem.
> 
> Alternatively, I could issue 4 requests to couch for the entire list by
> issuing POST request to couch and then zipping the arrays together. (I use
> Ruby at the application level...) Then, I just have to iterate over the new
> array one time and make no more requests to couch.  The reason this fails
> is that if you issue a POST request to couch with a key that is not in the
> view your are posting to, CouchDB doesn't respond with anything for that
> view row so it would make the array sizes different and therefore make it
> hard to handle in the client with iterating over the array multiple times.
> Once to join the data to its proper row and one more time when displaying
> the information.  If CouchDB gave me back the same number of rows as keys I
> requested I could easily join the arrays together in my application and
> significantly limit that amount of queries I am sending to couch.
> 
> For example:
> 
> Request 1:
>  URL: database/_design/Prodouct/_view/product_with_price?include_docs=true
>  Keys: ["product_id_1", "product_id_2", "product_id_3" ]
> 
>  Now, if my view had the following:
> 
>  if doc.type == product && doc.status == enabled
>    emit (product._id, { name: doc.name, _id: doc.prices }
> 
>  I would get back all 3 products as long as all three were enabled.  But
> if I set a product to disabled it won't show in the view row and therefore
> couch would return an array of only 2 results, which will make it hard when
> joining arrays in my application.
> 
> Request 2:
>  URL: database/_design/Product/_view/by_stock_levels?reduce=true&group=true
>  Keys: ["product_id_1", "product_id_2", "product_id_3" ]
> 
>  Side Note:  I can't combine 1&2 to one reduce view even though the key is
> the same because I get reduce overflow error.
> 
> Request 3:
>  URL: database/_design/Product/_view/by_customer_part_number
>  Keys: [["product_id_1", "customer_id"], ["product_id_2",
> "customer_id"], ["product_id_3", "customer_id"] ]
> 
>  If the customer doesn't have a doc that matches this view, couch won't
> return an empty row, it just won't return the row.  Therefore if customer
> had a matching row for products 1 and 3, and I just zipped the arrays of
> returned results together, I would get products 3 doc with product 2 in my
> application.  However if couch returns, "no key found" with an empty row,
> the joining of arrays in my application would still work.
> 
> Request 4:
>  URL: database/_design/Brand/_view/all
>  Keys:
> ["brand_id_for_product_id_1", "brand_id_for_product_id_2",
> "brand_id_for_product_id_3" ]
> 
> Now, if for some reason, a brand gets deleted and the ID is still on the
> product, Couch will return and array of rows that did not match the size in
> my application and it's conceivable that I could get the wrong brand on the
> wrong product.
> 
> I could of course check that the array sizes match and only merge if they
> match and if not, don't merge and make request on the per product basis
> when displaying results, but it just seems to me that it would be better if
> couch gave me feedback that no results match for that key i requested.
> This would save me a ton and certainly make working with couch more
> relaxing.
> 
> I could really really use this feature and I don't think it would be very
> much trouble at all to send a row if no key matches with just something
> like "key_not_found": null
> 
> *3) Return Multiple Linked Documents Per View Row*
> 

Interesting thought. Implementation wise this is also pretty cake. Hard part is defining a
syntax sand error conditions. First thought was an _id field as a list but then we have to
consider revision specifications as well as all the weird error situations. 

> I use the linked documents feature all the time.  Really helps me cut down
> on the number of requests I make.  But, I could even further cut down if I
> was able to get multiple docs back per row if I passed couch and array of
> ids I wanted with the row instead of just a single id.
> 
> So, using the example above in #2, lest say I had this view:
> 
> if doc.type == product && doc.status == enabled
>    emit (product._id, { name: doc.name, _id: doc.prices }
> 
> But, I also had a brand id stored, that I wanted to get in the same row...
> 
> lets say I just went like this:
> 
> if doc.type == product && doc.status == enabled
>    emit (product._id, { name: doc.name, docs: [{_id: doc.prices},{_id:
> doc.brand}] }
> 
> couch would respond with
> 
> docs: [prices_doc, brands_doc] plus my name field from the product doc.  I
> could get most everything I want in one query.
> 
> I know I can call emit multiple times, but again this just
> makes everything so much harder in my application because I don't know if
> there is a brand for every product or not.  It essentially forces me to
> loop through the array multiple times.  I could also do collation with
> reduce, but then I consistently run into reduce overflow errors as this is
> not what reduce is really designed to handle.
> 
> There my be some reason why this isn't possible, but I don't know it and I
> KNOW it would be useful from a users perspective.
> 
> Combine this with the feature in #2 and I could get everything I want from
> couch in 2 requests per page view that currently takes me 100 requests for
> 25 products.
> 
> *Final Thoughts*
> 
> Like I said in the beginning, I don't know if some of these are possible or
> not, but I know that they would make my life as a user of CouchDB much more
> relaxing.  I would sincerely appreciate it if anyone could give feedback on
> the possibility of each and what we have to do to get moving on these.  I
> am willing to put my cash up to anyone who can get these features included
> in couch.
> 
> I appreciate everyone who did taking the time to read this long ass email.
> I wanted to be clear.  If anyone has any other suggestions, please feel
> free to contact me.
> 
> Thanks,
> 
> James Hayton

Good thoughts. Implementation for these should be easy enough. Mostly these are just heavy
on figuring out the error conditions and a sane syntax to minimize possible errors. 
Mime
View raw message