couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ralf Nieuwenhuijsen" <ralf.nieuwenhuij...@gmail.com>
Subject Re: flexible filtering needed, with speed.
Date Tue, 19 Aug 2008 08:16:55 GMT
Don't take Futon as a speed measure; since it might also be slowing
down in the rendering part if your documents are big. (there is a lot
of stuff going on client=side as well).

The truth is, all data that is being searched, people only care about
3-5 different types of search.

You can offcourse, go nuts with the indexing and just generate all
possible indexes you could possible need.

Here is one of my favorites; this creates an index for every unique field.

function(doc) {
 for(var k in doc){
   emit([k,1,doc[k]], rdoc);
 }
}

You can query it like:

use startkey=['someField',1,null] and endkey=['someField',2,null]
To get the index for 'someField'.

Offcourse, this baby is going to create a huge index if used with too
many or too big documents, but I would at least try something like
that.

I use the above view function to make sure I can get the data sorted
however I want.

2008/8/19 Brad Anderson <brad@sankatygroup.com>:
> Howdy,
>
> I have 12K docs that look like this:
>
> {
>  "_id": "000111bf7a8515da822b05ebbb8cd257",
>  "_rev": "94750440",
>  "month": 17,
>  "store": {
>  "store_num": 123,
>  "city": "Atlanta",
>  "state": "GA",
>  "zip": "30301",
>  "exterior": true,
>  "interior": true,
>  "restroom": true,
>  "breakfast": true,
>  "sunday": true,
>  "adi_name": "Atlanta, GA",
>  "adi_num": 123,
>  "ownership": "Company",
>  "playground": "Indoor",
>  "seats": 123,
>  "parking_spaces": 123
>  },
>  "raw": {
>  "Other Hourly Pay": 0.28,
>  "Workers Comp - State Funds Exp": 401.65,
>  "Rent Expense - Company": -8,
>  "Archives Expense": 82.81,
>  "Revised Hours allowed per": 860.22,
>  "Merch Standard": 174.78,
>  "Total Property Tax": 1190.91
>
>  ...
>
>  }
> }
>
> I truncated 'raw' but it's usually much longer, and avg. doc size is 5K.
>
>  I'm trying to see how I will query them with views.  I want to be able to
> filter down by various store sub fields, i.e all the Breakfast = true stores
> in Georgia that are owned by Franchisees.  However, this will differ for
> just about every query.
>
> The 'reduce' function would then be averaging each line in the 'raw' field.
>
> I have played around with views that take the store filters, but just
> returning the 'raw' field as the value from the map function is brutally
> slow in Futon.  This is because the view is accessed right away, so it
> builds, takes about 3-4 mins (on a MBP with 4GB RAM, 2.2GHz dual core,
> 7200RPM disk).  I understand the next time this specific store group is
> requested, it's fast...  but they will all be so dynamic that this seems
> prohibitively slow.
>
> So, I thought, should I be doing this in two steps?  Set up the key to be
> store and whatever else I might want to query on (Month or whatever
> timeframe), and return the doc id's as the values on the original query?  I
> would then send in a complex key to do the filtering.  This would require
> waiting for the _bulk_get functionality, and I'd send that list of ID's into
> a 2nd query to get the raw data to send it to 'map'.
>
> This is slow now on 12K docs... It needs to be stupid-fast at that low
> number of docs, because the plan is for *way* more data.
>
> The filtering part is tailor-made for a RDBMS, but the doc handling (all the
> 'raw' fields will be different store-by-store, industry by industry, change
> over time, and in general be free-form) is perfect for CouchDB.  Thoughts?
>  I want to use the right tool for the job, and that's looking like a RDBMS,
> sadly.  That is, unless I'm completely misusing Couch.  In which case, swift
> blows to the head are welcome.
>
> Cheers,
> BA
>
>
>

Mime
View raw message