couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Smith <jason.h.sm...@gmail.com>
Subject Re: GSoc 2015 | COUCHDB-1743 Make the view server & protocol faster
Date Tue, 17 Mar 2015 12:48:04 GMT
Hi, Jan. Do you mean the Node.js view server? That one is called "couchjs"
and my fork is here: https://github.com/jhs/couchjs

If you mean something else, hm, can you remind me? Thanks! :)

On Tue, Mar 17, 2015 at 4:28 AM, Jan Lehnardt <jan@apache.org> wrote:

> Dear Buddhika,
>
> thank you for your interest in CouchDB and the CouchDB View Server!
>
> This is an area where you can make significant contributions to CouchDB.
>
> It is also a little bit involved, but you seem to have all the skills
> required to pull this off :)
>
> I’m happy to mentor you.
> > On 16 Mar 2015, at 10:03, Buddhika Jayawardhana <
> buddhika.anushka@gmail.com> wrote:
> >
> > Hi,
> > I am an Undergraduate of Department of Computer Science and Engineering
> > University of Moratuwa. I have been subscribed to couchdb mailing list
> > since months and I have been trying to learn some Erlang to work with
> > couchdb. I noticed project  "COUCHDB-1743 Make the view server & protocol
> > faster" is related to GSoC. I am willing to submit a project proposal for
> > this project.
> >
> > I have theoretical knowledge in software process, design patterns, and
> > other Engineering concepts. I've been using 'java', 'C++' for high-level
> > programming and 'C', a little bit of assembly for low-level programming
> and
> > PHP and JavaScript  for web development. Also I have sound knowledge  on
> > Erlang. I would be much thankful if you can guide to get familiar with
> the
> > project as soon as possible.
> >
> > Here are the problems in my mind
> >
> >   - Are the other programming languages that I should get familiar with?
>
> Erlang and JavaScript will do, some knowledge of C to understand the
> current system will help.
>
>
> >   - What are the technologies I should get familiar with?
>
> General knowledge of Unix/POSIX fundamentals (processes, fds, stdio etc.)
> will be required. Windows equivalent APIs too (but not strictly a
> requirement just yet).
>
>
> >   - I can work 40 hours per week for the project. Would that be enough to
> >   successfully complete the project?
>
> I can’t estimate whether you’d be able to complete this 100%, but I’m sure
> that this enough time to make a significant contribution, that the
> community then can take and finish up, should you not get to the end. E.g.
> don’t worry too much about this :)
>
>
> >   - What are the other resources that I should read before submitting the
> >   proposal?
>
> Familiarity with the CouchDB source can’t hurt. More in-depth knowledge of
> Erlang as well, http://learnyousomeerlang.com is a great free resource and
> the main Erlang docs are worth a read, as well. As are the various print
> books that are available from various publishers.
>
> It will definitely also help to read through the CouchDB Guide:
> http://guide.couchdb.org
>
> Although some parts have already been integrated into
> http://docs.couchdb.org,
> which you should also read, especially the bits about Design Documents,
> Views
> and List, Show, Validation, Filter and Update functions.
>
> In addition, check out the query_server_spec, it codifies the current query
> server protocol:
>
>
> https://github.com/apache/couchdb/blob/master/test/view_server/query_server_spec.rb
>
>
> > Hope you will guide me through the project.
>
> Again, thanks for taking an interest in this! :)
>
> To get things rolling, here’s my rough idea for how this could play out:
>
> Generally, there are three components, the Erlang and the JavaScript part
> and the JavaScript runtime or couchjs.
>
> We call all these things Query Server or View Server.
>
> The Erlang part lives in https://github.com/apache/couchdb-couch-mrview
>
> The JavaScript part lives in
> https://github.com/apache/couchdb/tree/master/share/server
>
> The current JavaScript runtime is Spidermonkey. We have our own C-wrapper
> around Spidermonkey, to make it a CLI tool that talks stdio:
>
>   https://github.com/apache/couchdb-couch/tree/master/priv/couch_js
>
>
> We’d generally like to move away from the custom C-wrapped Spidermonkey and
> have V8 be the execution engine. We also like to get away from having to
> maintain C/C++. It’d probably be simplest to use Node.js as a wrapper,
> because then many more people can contribute to this. Also, Node.js is good
> at streaming protocols, so it is a natural fit.
>
>
> Here is how I would start:
>
> 1. Create a new Query Server that *only* handles Show, List, Filter,
> Validation
>    and Update functions as that is a lot simpler on both the Erlang and
>    JavaScript side.
>
> 2. As part of 1: Design a new Query Server protocol that works in a
> streaming
>    fashion. The current one is request/response based and both sides are
> waiting
>    for one another while one of them is doing actual work. It’d be nice if
> both
>    could just keep working on whatever they need to do.
>
> 3. Once 1. and 2. are in place and working correctly, expand the new Query
> Server
>    to also handle Views. At this point, adding view support should not be
> too
>    complicated anymore.
>
>
> Things to watch out for:
>
> - map/reduce functions for CouchDB views need to be “pure”, e.g. we need
> to guarantee
>   they stay the same unless CouchDB can see any changes (and then
> invalidate the view
>   index). This means we need some extra isolation of the JS execution. And
> some
>   limitation or observation of the require() system.
>
>   There is a project that demonstrated we can do this. Jason Smith has run
> this,
>   but I can’t seem to find it on his GitHub. Jason, do you have any
> pointers?
>
> - A couchjs process can be used for multiple databases and different
> access control can
>   be configured per database. Data MUST NOT leak between databases. E.g.
> Errors that
>   are thrown when requesting a view result on database A must not show any
> process state
>   data that comes from database B (and vice versa).
>
> - The current system works much like CGI. A single process can handle one
> concurrent
>   request, if there are two concurrent requests, a new process is spawned.
> The new
>   Query Server should be able to handle multiple concurrent requests. But
> there will
>   be a time when a single process is saturated, at that point, we should
> be able to
>   spawn more Query Servers to help with the load. — In the 1./2./3. list
> above, I’d
>   either solve this upfront, or after 3., depending on what you are more
> comfortable
>   with. It might be easier to get started without this, but it might be
> harder to add
>   later and easier overall to have thought this through upfront.
>
> - Windows stdio can be troublesome, beware :)
>
> - Windows process handling can also be troublesome, that’s why we are using
>   https://github.com/apache/couchdb-couch/tree/master/priv/spawnkillable
> to kill/reap
>   couchjs process there. Not sure we still need this when we use Node.js,
> but worth
>   checking out.
>
> - I’ve had a bit time last year to experiment with streaming
> Erlang/Node.js communication.
>   It worked fine, but I didn’t get very far (the JavaScript part just
> echos commands
>   back to Erlang). The projects could help as inspiration:
>
>   https://github.com/janl/couch_query_server2
>   https://github.com/janl/node-couch-query-server2 key code is in
> src/couch_query_server2_sup.erl
>
>   It uses the Erlang pid as a stream marker so we can interleave requests.
>
>   Please excuse the lack of a README or other instructions!
>
>
> This is all I have for now. Other folks may want to chime in with their
> opinions :)
>
> If you have any more questions, let me know. If you want to take this into
> JIRA, let’s
> open a new ticket.
>
> Best
> Jan
> --
>
>
>
> > Thank You.
> >
> > --
> > *Buddhika Jayawardhana*
> > Undergraduate | Department of Computer Science & Engineering
> > University of Moratuwa
> > *buddhika.12@cse.mrt.ac.lk <buddhika.12@cse.mrt.ac.lk>* | LinkedIn
> > <http://lk.linkedin.com/in/buddhikajay/>
>
> --
> Professional Support for Apache CouchDB:
> http://www.neighbourhood.ie/couchdb-support/
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message