couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Miles Fidelman <mfidel...@meetinghouse.net>
Subject Re: overview of code organization?
Date Fri, 23 Apr 2010 14:12:06 GMT
Adam,

Adam Kocoloski wrote:
> On Apr 23, 2010, at 8:52 AM, Miles Fidelman wrote:
>    
>> - notes on the replication process (step-by-step, what happens when replication is
invoked - what code modules are involved and so forth), and/or,
>>      
> couch_rep_* modules handle replication.  How familiar are you with Erlang/OTP?  couch_rep_sup
is a supervisor for all replications, each of which has a couch_rep gen_server and changes_feed,
missing_revs, reader, and writer processes.  Each of those processes handles one part of the
"conversation" on the slide I pointed out to you two days ago.  Data flows from changes_feed
->  missing_revs ->  reader ->  writer.
>    

Pretty familiar with Erlang at a conceptual/system level; starting to 
take the time to get fluent in programming.  Haven't done functional 
languages in a long time.

>> - an overview of the code for someone new to the project - what lives in what modules,
how they string together - anything that might shortcut having to read through every module
and make sense of things from scratch
>>
>> Anything - handwritten notes, slides from a code walkthrough, that kind of thing.
>>      
> Hi Miles, not to sound critical, but I don't think such a broad request will get you
very far.  If you have specific questions I'll be happy to answer them.
>    
With all do respect... lots of projects maintain documentation of 
internals, particularly efforts focused on platform technologies 
intended for long-term and broad-based application.  Certainly in the 
world of commercial software development it's the rare project that 
doesn't have documentation providing a high level view of a large 
software system -- it's pretty hard to either bring new team members on 
board, or to perform long-term maintenance of code.  Granted that it's a 
bit harder to maintain this level of documentation on open-source 
projects without steady funding, but I will point at some examples:
- Linux Kernel Internals: somewhat old (2.4), but 
http://tldp.org/LDP/lki/index.html (I know there are updates)
- Apache HTTPD: http://httpd.apache.org/docs/2.2/developer/
- MongoDB, documentation of replication internals: 
http://www.mongodb.org/display/DOCS/Replication+Internals
- or even http://wiki.github.com/erlang/otp/routemap-source-tree - 
providing a basic overview of Erlang's internals

> Please, take a shot at reading the code for the part you're interested in.  If you come
across something you don't understand, send an email or join #couchdb on IRC.  Many of the
devs hang out there regularly and can walk you through the code.  Best,
>    

It doesn't seem that unreasonable to at least ask whether Couch has some 
similar documentation floating around - if only at the level of notes 
put together by an individual developer, or for discussion among developers.

Couch is certainly aiming at long-term viability as a platform for 
broad-based use, and seems to be aiming at being a broad-based 
open-source effort.  To succeed over the long term, it will NEED to have 
a good set of developer-level documentation.  "Read the code" is not a a 
long-term solution.

Re. replication, in specific, the the couch_rep_* modules do not contain 
much in the way of comments.

Personally, I've been involved in a LOT of network protocol-related work 
(BBN, back to the ARPANET days).  I've yet to see any kind of protocol 
work where someone hasn't jotted down at least a sequence diagram and 
some kind of dataflow diagram showing how all the pieces fit together.  
More common is a full-blown ASN.1 description, and eventually an RFC in 
full gory detail.

It does not seem unreasonable to ask if someone has jotted down notes 
about the full set of steps executed, and code modules involved, when 
Couch receives a "POST /_replicate" transaction.

At the very least, it sure would be helpful to have something like:
http://httpd.apache.org/docs/2.2/developer/request.html, or
http://www.apachetutor.org/dev/request
to detail the sequence of events and code involved in request processing.

If, in fact, that kind of information has never been put on "paper," and 
lives only in the source code and a few people's heads, that scares me a 
lot vis-a-vis committing to Couch as a platform for any kind of serious 
project.

Miles Fidelman

-- 
In theory, there is no difference between theory and practice.
In<fnord>  practice, there is.   .... Yogi Berra



Mime
View raw message