couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damien Katz <damienk...@gmail.com>
Subject CouchDB 1.0 work
Date Mon, 28 Apr 2008 16:27:40 GMT
Here are my thoughts on what we need for before we can get to CouchDB  
1.0. Feedback please.

Must have:

Incremental reduce: Maybe single biggest outstanding work item.  
Probably 2 weeks of development to get to a testable state

Security/Document validation: We need a way to control who can update  
what documents and to validate the updates are correct. This is  
absolutely necessary for offline replication, where replicated updates  
to the database do not come through the application layer.

View index compaction/management: View indexes currently just grow,  
need a compaction similar to storage compaction. Also, there is no way  
to purge old unused indexes, except via the OS.

File sync problem: file:sync(), a call that flushes all uncommitted  
writes to disk before returning, doesn't work fully or at all on all  
some platforms (usually we just lack the flags to tell the OS to write  
to disk). Should be fixable by either patching the existing Erlang  
driver source, or using a replacement file driver.

Optimizations. Right now HTTP overhead is huge, with HTTP latency/ 
overhead at about 80% of  our document read time when loaded from  
local client (same machine). Once we can get this down to below 50%,  
we'll focus on optimizing the database and other component. Most core  
database operations, document reads, updates and view indexing are  
completely unoptimized so far, which the update speed being the  
biggest complaint.

Testing: We need lots more tests. By the time we ship 1.0, we should  
have far more test suite code than production code. And we need to do  
load testing. Will the current browser based test suite can scale for  
this kind of heavy testing?

Nice to have:

Plugs in: Erlang module plug-in architecture, to make adding new  
server side code easy. Right now the code that maps special urls  
(_view, _compact, _search, etc) to the appropriate Erlang call is  
messy and convoluted, and getting worse as we go. We need a standard  
way to map the special urls to the appropriate Erlang call.

Tail committed database headers: To optimize the updating of database  
by reducing the number and length of seeks required, the file header  
should be written to the end of the file, rather than the beginning.  
Depending on platform this can remove a full headseek and in the best  
case scenario a document insert/update can require zero head seeks (if  
the head is already positioned at the end of the file). But this can  
slow file opening speed as it may need to do a search in the file for  
the most recent valid header. In the result of a crash, the header  
scan/search cost at database open can be linear or logarithmic,  
depending on the exact implementation.

Clustering: The ability to cluster CouchDB servers, to increase both  
reliability (failover-clustering) and client scalability (more servers  
to handle more concurrent user load). Clustering does not increase  
data scalability, which is  (that's partitioning/sharding).

Selective document purging/compaction: Deletion stubs are kept around  
for replication purposes. Need a way to purge the records of document  
that are old or deleted.

Revision rev path pruning: Each document keeps a list of all previous  
revisions. We need a way to prune the oldest records of document  
revisions and remerge pruned lists during replication.

Don't Need:

Authentication. We can go to 1.0 without authentication, relying  
instead on local proxies to provide authentication.

Partioning. Partitioning is a big project with lots of considerations.  
It's best to move this post 1.0.

Mime
View raw message