incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Scott Shumaker <sshuma...@gmail.com>
Subject Re: New site powered by CouchDB - magnifeast.com
Date Fri, 07 Aug 2009 20:46:47 GMT
Since restaurant delivery regions can be multiple sets of arbitrary
polygons, our geo-search happens in our search engine (that pulls docs
from Couch).  We basically have 'gridded' up the city with fairly
large grid cells.  In each grid cell, we keep track of which
restaurants deliver to all or part of this region.  When a client does
a search, we see which grid cell they fall into, and then do further
refinement by testing against the restaurants in that cell.  We're all
former game developers, so a lot of these techniques are akin to video
game collision detection techniques - it's important to trivially
reject and only do expensive tests against a small set of restaurants.
 We actually construct a BSP tree for the polygonal restaurants, which
is tested after we've done some additional trivial accept/rejects.

Hours work similarly - if you want to see restaurants that are 'open
now' or 'delivering to you now' - we have special hour buckets - for
each 30 minute timeslot in the week, we store a list of restaurants
that are open.  (We actually have separate 'delivery hours' vs.
regular hours).  So we can quickly use the current time, generate a
candidate list of restaurants that are open, and further refine if
necessary.

During a search, we typically generate several lists of matches (based
on multiple search criteria) and then intersect them - and then
finally do a sort to return results to the client.  As an
optimization, the more expensive tests (e.g. delivery region test) are
done after the intersection.

Scott


On Fri, Aug 7, 2009 at 9:00 AM, Tommy Chheng<tommy.chheng@gmail.com> wrote:
> Great post and site, Scott!
>
> How are you handling the geo-search with CouchDB?
>
> Tommy
>
> On Aug 7, 2009, at 1:21 AM, Scott Shumaker wrote:
>
>> We've just launched our site - Magnifeast (http://www.magnifeast.com)
>> - which uses CouchDB as its primary persistence mechanism.
>>
>> Magnifeast is a site that lets you order online from over 500
>> restaurants.  We're initially launching in Los Angeles, but plan to
>> expand to other cities soon.
>>
>> CouchDB was a good fit for us because restaurant and menu data can be
>> very complicated - and since we're modeling real world data, it
>> doesn't fit any kind of rigid schema.  Even something as simple as
>> delivery area can be quite variable - some restaurants don't deliver
>> at all, others deliver to a radius, some have multiple polygonal
>> delivery regions (each with different fees), some deliver to a list of
>> cities, etc.  And menus are even more complicated.  Having a
>> traditional schema-oriented database (SQL) would be incredibly
>> frustrating and limiting.  We've been able to use CouchDB's
>> flexibility to express a much richer understanding of restaurants and
>> menus than pretty much anyone else out there.
>>
>> Some technical notes:
>> Our site runs on Amazon EC2.  Clients don't talk to CouchDB directly -
>> they talk through our application servers (currently running Merb).
>> The app servers do authorization for reads - for example, we have
>> custom validation rules for various views that are addressable by the
>> client.  We have validation rules that run in CouchDB for writes.
>> Right now, since there isn't a 1-to-1 mapping between our site users
>> and CouchDB users, we actually append some additional fields to each
>> document at write-time containing information about the user role and
>> permissions (ideally, this could be passed as an additional parameter
>> to CouchDB, which could be checked in the validation function).  The
>> app servers also pre-save hooks that match various criteria, and might
>> modify the documents before saving.  It turns out that to for this to
>> work correctly in our case, we actually have to return a list of
>> deltas (fields that have been changed / removed) to the client browser
>> so they can be applied.  Returning the full object to the client
>> doesn't work because the client may have made changes to the object
>> since the write but before the write finished.  This is a difficult
>> problem to solve in general, caveat emptor.
>>
>> Unlike many traditional websites, and more akin to a couchApp, the
>> actual HTML pages for our site are constructed on the in-browser on
>> the client.  The clients download templates from the server, along
>> with JSON data that comes from CouchDB, and use the data with the
>> template to assemble the page.  The templates are written in a
>> proprietary language called Jolt, which is javascript-based and 'Live'
>> - in the sense that changes in the JSON data update the page in
>> realtime.  Some of the pages on the site have a very interactive feel
>> because of this.
>>
>> When the client loads data from CouchDB, the objects are loaded using
>> a system that can transform the JSON into instances of Javascript
>> classes, resolving references to other CouchDB documents (and
>> references to members of other document).  For example, when loading a
>> menu, we typically load a menu document, menu items, and menu
>> sections, all which are stored as top-level CouchDB documents.  These
>> documents all reference each other, and come in the run-time as
>> instantiated javascript classes, with all of the references fixed up.
>> A menu item can consist of dozens of internal objects as well.  If
>> changes were made to the menu item (for example, in our menu editing
>> admin tools), the menu item would be serialized back out to JSON, all
>> of the references to external objects replaced with pointer stubs, and
>> written to CouchDB.  This system effectively lets us serialize out
>> fairly complicated object graphs into CouchDB.
>>
>> Our search page (http://www.magnifeast.com/restaurant) doesn't hit
>> CouchDB.  Instead, we have a custom search server, which pulls
>> documents from CouchDB using all_docs_by_seq, and does custom
>> indexing.  The searching effectively buckets couchdb documents by
>> various criteria, and finds the intersection of the various buckets in
>> responses to queries (for example, you might search for restaurants
>> open 'now' that deliver to your home address with Thai cuisine that
>> contain the search term 'Bangkok').  For textual searches, the search
>> server hits Ferret (similar to Lucene), and intersects the returned
>> result list with the other criteria.
>>
>> There's a lot more going on behind the scenes.  We actually have a lot
>> more admin pages than regular pages - building tools so we could
>> efficiently get hundreds of menus and restaurants fully represented in
>> our system was a lot of work.  And this is just the beginning - we've
>> been focusing intensely on online ordering for launch, but want to
>> build out the site to be much more than that.
>>
>> I can go into more technical details if people are interested.  Thanks
>> for all your hard work so far - I'm definitely glad we had an
>> alternative to SQL!
>>
>> Scott
>
>

Mime
View raw message