Hi Damien, thanks for taking the time to compile these lists. I'd like to see parallel Map/Reduce execution in 1.0. Right view index creation is run in a single process where it could run in multiple processes. Would it make sense to spawn N `couchjs` instances (N = number of CPUs/cores, or configurable) to run the map and the reduce stages in parallel? This would move (if it isn't already) the bottleneck towards disk I/O rather than JSON serialization. Cheers Jan -- On 2 Dec 2008, at 20:34, Damien Katz wrote: > Here is some stuff I'd like to see in a 1.0.0 release. Everything is > open for discussion. > > - Built-in reduce functions to avoid unnecessary JS overhead - > > Count, Sum, Avg, Min, Max, Std dev. others? > > - Restrict database read access - > > Right now any user can read any database, we need to be able to > restrict that at least on a whole database level. > > - Replication performance enhancements - > > Adam Kocoloski has some replication patches that greatly improve > replication performance. > > - Revision stemming: It should be possible to limit the number of > revisions tracked - > > By default each document edit produces a revision id that is tracked > indefinitely. This guarantees conflicts versus subsequent edits can > always be distinguished in ad-hoc replication, however the forever > growing list of revisions isn't always desirable. THis can be > addressed by limiting the number tracked and purging the oldest > revisions. The downside is that if the revision tracking limited is > N, then anyone who hasn't replicated a document since its last N > edits will see a spurious edit conflict. > > - Lucene/Full-text indexing integration - > > We have this working to in side patches, this needs to be integrated > to trunk and with the view engine > > - Incremental document replication - > > We need at the minimum the ability to incrementally replicate only > the attachments that have changed in a document. This will save lots > of network IO and CouchDB can be version control system with > document diffs added as attachments. > > This can work for document fields too, but the overhead may not be > worth it. > > - Built-in authentication module(s) - > > The ability to host a CouchDB database used for HTTP authentication > schemes. If storing passwords, they would need to be stored > encrypted, decrypted on demand by the authentication process. > > - View server enhancements (stale/partial index option) - > > Chris Anderson has a side branch for this we need to finish and put > into trunk. > > - View index compaction - > > Views indexes expand forever, and need to be compacted in a similar > way the storage files are compacted. This work will tie into the > View Server enhancements. > > - Document integrity/deterministic revid - > > For the sake of end to end document integrity, we need a way to hash > a document's contents, and since we already have revision ids, I > think the revision ids should be the hashes. The hashed document > should be a canonical json representation, and it should have the > _id and _rev fields in it. The _rev will be the PREVIOUS revision ID/ > hash the edit is based on, or blank if a new edit. Then the _rev is > replaced with the new hash value. > > - Fully tail append writes - > > CouchDB uses zero-overwrite storage, but not fully tail append > storage. Document json bodies are stored in internal buffers, > written consecutively, one after another until the buffers in > completely full, then another buffer is created at the end of the > file for more documents. File attachments are written to similar > buffers as well. Btree updates are always tail append, each update > to a btree, even if its a deletion, causes new writes to the end of > the file. Once the document, attachments and indexes are commited > (fsync), the header is then written and flushed to disk, and that is > always stored right at the beginning of the file (requiring another > seek). > > Document updates to CouchDB require 2 fsyncs with ~3 seeks for full > committal and index consistency. This is true if you write 1 or 1000 > documents in a single transaction (bulk update), you still need ~ 3 > seeks. Using conventional transaction journalling, it's possible to > get the committal down to a single seek and fsync, and worry about > ensuring file and index consistency asynchronously, often in batch > mode with other committed updates. This can perform very well, but > has downsides like extra complexity and increased memory usage as > data is cached waiting to be flushed to disk, and must do special > consistency checks and fix-ups on startup if there is a crash. > > If CouchDB used tail-append storage for everything, then all > document updates can be completely flushed with full file > consistency with a single seek and, depending on the file system, a > single fsync. All the disk updates, documents, file attachments, > indexes and file header, occur as appends to the end of the file. > > The biggest changes will be in how file attachments and the headers > are written and read, and the performance characteristics of view > indexing as documents will no longer be packed into contiguous > buffers. > > File attachment will be written in chunks with the last chunk being > an index to the other chunks. > > Headers will be specially signed blocks written to the end of the > file. Reading the header on database open will require scanning the > file from the end, since the file might have partial updates that > didn't complete since the last update. > > The performance of the views will be impacted as the documents are > more likely to be fragmented across the storage file. But they will > still be in the order they will be accessed for indexing, so the > read seeks are always moving forward. Also, the act of compacting > the storage file will result in the documents being tightly packed > again. > > - Streaming document updates with attachment writes - > > Using mime mulitpart encoding, it should be possible to send all > parts of a document in a single http request, with the json and > binary attachments sent as different mime parts. Attachments can be > streamed to disk as bytes are received, keeping total memory > overhead to a minimum. Attachments can also be written to disk in > compressed format and served over http by default in that compressed > format, using 0% CPU for compression at read time, but will require > decompression if the client doesn't support the compression format. > > > - Partitioning/Clustering Support - > > Clustering for failover and load balancing is priority. Large > database support via partitioning may not make 1.0 > > > > >