incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Nick Johnson" <arach...@notdot.net>
Subject Re: Thinking outside the RDBMS box - how do I... ?
Date Tue, 28 Oct 2008 21:01:35 GMT
On Tue, Oct 28, 2008 at 4:17 PM, Chris Anderson <jchris@apache.org> wrote:

> The simplest approach would be:
>
> When a message comes from a phone, create a document like so:
>
> {
> "_id":"0b3a963dc37079652fa3092223be35b5",
> "_rev":"1805961941",
> "message" : "hey all this is the message",
> "phone": "971 555-1212",
> "time": "2008/06/30 06:30:35 +0000"
> }
>
> this document would never need to be updated, except in the case of
> deletion.
>
> Also maintain docs for groups, eg:
>
> {
> "_id":"0b3a963dc37079652fa3092223be35b5",
> "_rev":"1805961941",
> "group" : "a group can have a title",
> "phones": ["971 555-1212","818 555-1212", "503 555-1212", "512 555-1212"],
> }
>
> Then to load all the messages from phones in a group, GET the group
> document, then run a multi-key request against a map view of the
> messages, which is keyed by phone number. You have the whole thing in
> 2 requests.
>
> The bottleneck here is that when you start to have phone numbers with
> hundred or thousands of messages, there will be an awful lot of data
> coming back from a big group. Currently multi-key only allows exact
> key matches, so it won't support selecting, say, the 5 most recent
> messages from each phone number. For now you can filter that in your
> application. There are some proposals to allow combinations of view
> queries to be specified in a single request, which would make those
> complex queries doable in a single request.


You could also define a view that maps messages to (phone, message), then
use a reduce function to only include the 5 most recent messages for that
phone number. Then you can do a multi-get on that view instead.


>
>
> This is just the simplest way to do it. Of course, a more efficient
> way would involve getting messages into a view, sorted by group id.
> Then pulling all messages for a group would be a cinch. There isn't
> really an obviously best way to do that, as it requires you to have
> document with both group-ids and message data in them.
>
> The trouble is, the simple way of doing that, is to store all messages
> from a phone, as well as all the groups which the phone is a member
> of, on the same document. Those documents would be a source of
> constant update contention, which doesn't sound like fun.
>
> A better solution: You could write all the group-ids for a given
> phone, into the message documents themselves. Then you'd have to go
> back in and update all of a phone's messages anytime it joined a new
> group. Not the worst thing (_bulk_docs will help) but still a bit of a
> pita.
>
> Basically its a tradeoff between read-time and write-time complexity.
> If I were you, I'd try the first approach, and if that doesn't work
> out, then the third (group ids kept up to date on message docs).
>
> Chris
>
> On Tue, Oct 28, 2008 at 8:34 AM, Brit Gardner <brit@britg.com> wrote:
> > Howdy,
> >
> > I've been banging my head around some previous threads and how they may
> > apply to the following scenario, i.e. using some of the techniques
> presented
> > in 'Associating Users and Comments' from earlier, but my scenario differs
> a
> > bit:
> >
> > - we have data coming in off mobile phones
> > - the key in this data is the phone number of the originating phone
> > - we have user-defined groups
> > - groups can consist of multiple phone numbers
> > - groups can be created at any time and will include all legacy data from
> > before the group was created as well as all new incoming data
> > - phone numbers can exist in multiple groups
> > - ultimately, data is accessed through the groups
> >
> > This is a fairly straight-forward RDBMS system, but I'm wondering is
> there a
> > good way to approach this in a map-reduce context?  i.e. do I need to
> update
> > every data document on the fly when groups are created? Or, maybe I
> should
> > keep the data keys stored in the group documents.  If group document
> holds
> > the existing data keys as well as being updated with all incoming data
> keys
> > - won't that create a bottleneck since documents must be fully written
> when
> > updated?
> >
> > Thanks in advance for any input!
> >
>
>
>
> --
> Chris Anderson
> http://jchris.mfdz.com
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message