couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Davis <paul.joseph.da...@gmail.com>
Subject Re: M/R/M, again
Date Fri, 29 Jan 2010 14:37:13 GMT
Markus,

Sorry its taken me so long to sit down and tap out a reply.

So, as to M/R/M, it turns out to be quite a bit harder to keep the
same semantics of incremental view updates as well as the same reduce
semantics when moving to a 'pure' implementation inside the view
engine. Specifically, subsequent map's would need to be updated at the
same time as the first map or we would need to add an update sequence
like the main database has. Neither of these is a very good solution
IMO. Also, the reduce semantics make it hard to hook subsequent M/R
steps up to a view because of how reduces are implemented. The fix
would require making reductions be persisted to a b~tree and then we'd
need to pre-declare group_levels some how. Quite a bit of work. Also,
because we aren't a 'Google M/R' implementation that guarantees 1
unique key after each M/R stage the merge step becomes less trivial
than the original M/R/M paper.

These hurdles aren't insurmountable, but the longer I looked at the
issues the more I thought that I would probably just end up writing a
new indexer that has a slightly different M/R model to allow for such
things. And then promptly never got around to it.

However I have been trying to figure out how to create a CouchDB
version of Riak's Jaywalker feature. It could do similar things to
what you're wanting, but there are a couple problems that would put
the hurt on cluster setups with the initial method I have in mind. And
its a fairly decent sized addition so unless the implementation
suddenly crystalizes into a simple solution I don't think it'll be in
0.11 and hence 1.0.

HTH,
Paul Davis

On Thu, Jan 28, 2010 at 9:57 AM, Markus Jelsma <markus@buyways.nl> wrote:
> Hi Jan,
>
>
> Thanks for your reply, but i'm afraid that i have provided a lousy explanation
> of the case i run in to. Let me explain with actual examples for i believe
> Damien's examples do not fit my use case.
>
> I have a tiny database with two types of documents, profile and
> profileApplication. The profile type has an ID which is the user's e-mail
> address and a simply username field, nothing more (see below for anatomy of
> both document types).
>
> {
>   "_id": "markus@buyways.nl",
>   "_rev": "1-5f7718ae8a627f4cf5b93b63420b7e1f",
>   "type": "profile",
>   "username": "markus17"
> }
> {
>   "_id": "1d2d9db700029557666e5d260b2ea038",
>   "_rev": "2-279daa538abc5cbb4b1524d29ce4ab53",
>   "type": "profileApplication",
>   "applicationId": "app2",
>   "profileId": "markus@buyways.nl",
>   "primaryId": 18
> }
>
> The documents with profileApplication type are related to both an application
> (which i have omitted for now) and a profile. In RDBMS terms its purpose would
> be a common link table.
>
> The purpose for this relation is that a single profile can have a different
> primaryId for different applications. My profile (markus@buyways.nl) would
> have primaryId=18 for app2 and primaryId=17 for app1 etc.
>
> The goal would be to retrieve both my profile document _and_ the primaryId
> that goes with my profile for app1 or app2, ideally the query would be
> key=["markus@buyways.nl", "app1"], but this is currently not possible.
>
> There are two things i can do now:
> 1) retrieve the profile first and then fetch the primaryId for the application
> i need, but this takes two requests and manually merging of the profile data
> and primaryId;
>
>
> http request 1:
> http://localhost:5984/test/markus@buyways.nl
>
> output:
> {"_id":"markus@buyways.nl","_rev":"1-5f7718ae8a627f4cf5b93b63420b7e1f","type":"profile","username":"markus17"}
>
> http request 2:
> http://localhost:5984/test/_design/profiles/_view/getPrimaryByEmailAndApplication?key=[%22markus@buyways.nl%22,
> %20%22app1%22]
>
> output:
> {"total_rows":4,"offset":2,"rows":[
> {"id":"f51b92f4a59de0e28641375637a73050","key":["markus@buyways.nl","app1"],"value":17}
> ]}
>
> It's clear that i need to merge the value of the second request with the
> document received by the first.
>
>
> 2) fetch the profile and all related primaryIds in one go, this is one single
> requests but i also get primaryId's for apps that i don't need so this fetches
> more data and also needs clientside merging after i filtered out the app i
> need.
>
>
> http request 1:
> http://localhost:5984/test/_design/profiles/_view/getProfileApplications?startkey=[%22markus@buyways.nl%22]&endkey=[%22markus@buyways.nl%22,
> %20%22zzz%22]
>
> output:
> {"total_rows":6,"offset":3,"rows":[
> {"id":"markus@buyways.nl","key":["markus@buyways.nl",1],"value":null},
> {"id":"f51b92f4a59de0e28641375637a73050","key":["markus@buyways.nl","app1",2],"value":17},
> {"id":"1d2d9db700029557666e5d260b2ea038","key":["markus@buyways.nl","app2",2],"value":18}
> ]}
>
> It's clear that i need to filter my profile document and the
> profileApplication document for the app i want (app1). The bad thing here is
> that i do not get my profile document in the value (although i can emit it but
> that's), if i include_docs i'll also get a lot of extra data on the documents
> i don't need, here it's just one document but i can be many.
>
>
>
> Both techniques work and have their pros and cons. But do you agree that it
> would be much more convenient if we could simply construct views that carry
> merged or combined documents using key=["markus@buyways.nl","app1"].
>
> Am i correct to assume i cannot achieve the goal stated above without either
> Chris' technique or merging of documents in one single view?
>
> Please forgive me if i somehow didn't understand Damien's example but i
> believe that deals with arithmetic instead of merging complex data structures.
> I also didn't (yet?) feel that the new 0.11 linked documents feature will help
> me out here. Also, i wish to keep this data in separate documents, keeping an
> array within the profile document isn't really the best approach i think.
>
>
>
> Cheers,
>
>
>>See http://damienkatz.net/2008/02/incremental_map.html and
>>http://damienkatz.net/2008/02/incremental_map_1.html and the
>>comments on both.
>
> Markus Jelsma - Technisch Architect - Buyways BV
> http://www.linkedin.com/in/markus17
> 050-8536620 / 06-50258350
>
>

Mime
View raw message