couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brad Anderson <b...@sankatygroup.com>
Subject flexible filtering needed, with speed.
Date Tue, 19 Aug 2008 03:31:00 GMT
Howdy,

I have 12K docs that look like this:

{
  "_id": "000111bf7a8515da822b05ebbb8cd257",
  "_rev": "94750440",
  "month": 17,
  "store": {
   "store_num": 123,
   "city": "Atlanta",
   "state": "GA",
   "zip": "30301",
   "exterior": true,
   "interior": true,
   "restroom": true,
   "breakfast": true,
   "sunday": true,
   "adi_name": "Atlanta, GA",
   "adi_num": 123,
   "ownership": "Company",
   "playground": "Indoor",
   "seats": 123,
   "parking_spaces": 123
  },
  "raw": {
   "Other Hourly Pay": 0.28,
   "Workers Comp - State Funds Exp": 401.65,
   "Rent Expense - Company": -8,
   "Archives Expense": 82.81,
   "Revised Hours allowed per": 860.22,
   "Merch Standard": 174.78,
   "Total Property Tax": 1190.91

   ...

  }
}

I truncated 'raw' but it's usually much longer, and avg. doc size is 5K.

  I'm trying to see how I will query them with views.  I want to be  
able to filter down by various store sub fields, i.e all the Breakfast  
= true stores in Georgia that are owned by Franchisees.  However, this  
will differ for just about every query.

The 'reduce' function would then be averaging each line in the 'raw'  
field.

I have played around with views that take the store filters, but just  
returning the 'raw' field as the value from the map function is  
brutally slow in Futon.  This is because the view is accessed right  
away, so it builds, takes about 3-4 mins (on a MBP with 4GB RAM,  
2.2GHz dual core, 7200RPM disk).  I understand the next time this  
specific store group is requested, it's fast...  but they will all be  
so dynamic that this seems prohibitively slow.

So, I thought, should I be doing this in two steps?  Set up the key to  
be store and whatever else I might want to query on (Month or whatever  
timeframe), and return the doc id's as the values on the original  
query?  I would then send in a complex key to do the filtering.  This  
would require waiting for the _bulk_get functionality, and I'd send  
that list of ID's into a 2nd query to get the raw data to send it to  
'map'.

This is slow now on 12K docs... It needs to be stupid-fast at that low  
number of docs, because the plan is for *way* more data.

The filtering part is tailor-made for a RDBMS, but the doc handling  
(all the 'raw' fields will be different store-by-store, industry by  
industry, change over time, and in general be free-form) is perfect  
for CouchDB.  Thoughts?  I want to use the right tool for the job, and  
that's looking like a RDBMS, sadly.  That is, unless I'm completely  
misusing Couch.  In which case, swift blows to the head are welcome.

Cheers,
BA



Mime
View raw message