couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan Lehnardt <>
Subject Re: 1.0.0 wishlist/roadmap
Date Thu, 04 Dec 2008 13:10:22 GMT
  - A native Erlang API.

On 2 Dec 2008, at 20:34, Damien Katz wrote:

> Here is some stuff I'd like to see in a 1.0.0 release. Everything is  
> open for discussion.
> - Built-in reduce functions to avoid unnecessary JS overhead -
> Count, Sum, Avg, Min, Max, Std dev. others?
> - Restrict database read access -
> Right now any user can read any database, we need to be able to  
> restrict that at least on a whole database level.
> - Replication performance enhancements -
> Adam Kocoloski has some replication patches that greatly improve  
> replication performance.
> - Revision stemming: It should be possible to limit the number of  
> revisions tracked -
> By default each document edit produces a revision id that is tracked  
> indefinitely. This guarantees conflicts versus subsequent edits can  
> always be distinguished in ad-hoc replication, however the forever  
> growing list of revisions isn't always desirable. THis can be  
> addressed by limiting the number tracked and purging the oldest  
> revisions. The downside is that if the revision tracking limited is  
> N, then anyone who hasn't replicated a document since its last N  
> edits will see a spurious edit conflict.
> - Lucene/Full-text indexing integration -
> We have this working to in side patches, this needs to be integrated  
> to trunk and with the view engine
> - Incremental document replication -
> We need at the minimum the ability to incrementally replicate only  
> the attachments that have changed in a document. This will save lots  
> of network IO and CouchDB can be version control system with  
> document diffs added as attachments.
> This can work for document fields too, but the overhead may not be  
> worth it.
> - Built-in authentication module(s) -
> The ability to host a CouchDB database used for HTTP authentication  
> schemes. If storing passwords, they would need to be stored  
> encrypted, decrypted on demand by the authentication process.
> - View server enhancements (stale/partial index option) -
> Chris Anderson has a side branch for this we need to finish and put  
> into trunk.
> - View index compaction -
> Views indexes expand forever, and need to be compacted in a similar  
> way the storage files are compacted. This work will tie into the  
> View Server enhancements.
> - Document integrity/deterministic revid -
> For the sake of end to end document integrity, we need a way to hash  
> a document's contents, and since we already have revision ids, I  
> think the revision ids should be the hashes. The hashed document  
> should be a canonical json representation, and it should have the  
> _id and _rev fields in it. The _rev will be the PREVIOUS revision ID/ 
> hash the edit is based on, or blank if a new edit. Then the _rev is  
> replaced with the new hash value.
> - Fully tail append writes -
> CouchDB uses zero-overwrite storage, but not fully tail append  
> storage. Document json bodies are stored in internal buffers,  
> written consecutively, one after another until the buffers in  
> completely full, then another buffer is created at the end of the  
> file for more documents. File attachments are written to similar  
> buffers as well. Btree updates are always tail append, each update  
> to a btree, even if its a deletion, causes new writes to the end of  
> the file. Once the document, attachments and indexes are commited  
> (fsync), the header is then written and flushed to disk, and that is  
> always stored right at the beginning of the file (requiring another  
> seek).
> Document updates to CouchDB require 2 fsyncs with ~3 seeks for full  
> committal and index consistency. This is true if you write 1 or 1000  
> documents in a single transaction (bulk update), you still need ~ 3  
> seeks. Using conventional transaction journalling, it's possible to  
> get the committal down to a single seek and fsync, and worry about  
> ensuring file and index consistency asynchronously, often in batch  
> mode with other committed updates. This can perform very well, but  
> has downsides like extra complexity and increased memory usage as  
> data is cached waiting to be flushed to disk, and must do special  
> consistency checks and fix-ups on startup if there is a crash.
> If CouchDB used tail-append storage for everything, then all  
> document updates can be completely flushed with full file  
> consistency with a single seek and, depending on the file system, a  
> single fsync. All the disk updates, documents, file attachments,  
> indexes and file header, occur as appends to the end of the file.
> The biggest changes will be in how file attachments and the headers  
> are written and read, and the performance characteristics of view  
> indexing as documents will no longer be packed into contiguous  
> buffers.
> File attachment will be written in chunks with the last chunk being  
> an index to the other chunks.
> Headers will be specially signed blocks written to the end of the  
> file. Reading the header on database open will require scanning the  
> file from the end, since the file might have partial updates that  
> didn't complete since the last update.
> The performance of the views will be impacted as the documents are  
> more likely to be fragmented across the storage file. But they will  
> still be in the order they will be accessed for indexing, so the  
> read seeks are always moving forward. Also, the act of compacting  
> the storage file will result in the documents being tightly packed  
> again.
> - Streaming document updates with attachment writes -
> Using mime mulitpart encoding, it should be possible to send all  
> parts of a document in a single http request, with the json and  
> binary attachments sent as different mime parts. Attachments can be  
> streamed to disk as bytes are received, keeping total memory  
> overhead to a minimum. Attachments can also be written to disk in  
> compressed format and served over http by default in that compressed  
> format, using 0% CPU for compression at read time, but will require  
> decompression if the client doesn't support the compression format.
> - Partitioning/Clustering Support -
> Clustering for failover and load balancing is priority. Large  
> database support via partitioning may not make 1.0

View raw message