incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Anderson" <jch...@apache.org>
Subject Re: Thinking outside the RDBMS box - how do I... ?
Date Tue, 28 Oct 2008 16:17:08 GMT
The simplest approach would be:

When a message comes from a phone, create a document like so:

{
"_id":"0b3a963dc37079652fa3092223be35b5",
"_rev":"1805961941",
"message" : "hey all this is the message",
"phone": "971 555-1212",
"time": "2008/06/30 06:30:35 +0000"
}

this document would never need to be updated, except in the case of deletion.

Also maintain docs for groups, eg:

{
"_id":"0b3a963dc37079652fa3092223be35b5",
"_rev":"1805961941",
"group" : "a group can have a title",
"phones": ["971 555-1212","818 555-1212", "503 555-1212", "512 555-1212"],
}

Then to load all the messages from phones in a group, GET the group
document, then run a multi-key request against a map view of the
messages, which is keyed by phone number. You have the whole thing in
2 requests.

The bottleneck here is that when you start to have phone numbers with
hundred or thousands of messages, there will be an awful lot of data
coming back from a big group. Currently multi-key only allows exact
key matches, so it won't support selecting, say, the 5 most recent
messages from each phone number. For now you can filter that in your
application. There are some proposals to allow combinations of view
queries to be specified in a single request, which would make those
complex queries doable in a single request.

This is just the simplest way to do it. Of course, a more efficient
way would involve getting messages into a view, sorted by group id.
Then pulling all messages for a group would be a cinch. There isn't
really an obviously best way to do that, as it requires you to have
document with both group-ids and message data in them.

The trouble is, the simple way of doing that, is to store all messages
from a phone, as well as all the groups which the phone is a member
of, on the same document. Those documents would be a source of
constant update contention, which doesn't sound like fun.

A better solution: You could write all the group-ids for a given
phone, into the message documents themselves. Then you'd have to go
back in and update all of a phone's messages anytime it joined a new
group. Not the worst thing (_bulk_docs will help) but still a bit of a
pita.

Basically its a tradeoff between read-time and write-time complexity.
If I were you, I'd try the first approach, and if that doesn't work
out, then the third (group ids kept up to date on message docs).

Chris

On Tue, Oct 28, 2008 at 8:34 AM, Brit Gardner <brit@britg.com> wrote:
> Howdy,
>
> I've been banging my head around some previous threads and how they may
> apply to the following scenario, i.e. using some of the techniques presented
> in 'Associating Users and Comments' from earlier, but my scenario differs a
> bit:
>
> - we have data coming in off mobile phones
> - the key in this data is the phone number of the originating phone
> - we have user-defined groups
> - groups can consist of multiple phone numbers
> - groups can be created at any time and will include all legacy data from
> before the group was created as well as all new incoming data
> - phone numbers can exist in multiple groups
> - ultimately, data is accessed through the groups
>
> This is a fairly straight-forward RDBMS system, but I'm wondering is there a
> good way to approach this in a map-reduce context?  i.e. do I need to update
> every data document on the fly when groups are created? Or, maybe I should
> keep the data keys stored in the group documents.  If group document holds
> the existing data keys as well as being updated with all incoming data keys
> - won't that create a bottleneck since documents must be fully written when
> updated?
>
> Thanks in advance for any input!
>



-- 
Chris Anderson
http://jchris.mfdz.com

Mime
View raw message