From couchdb-dev-return-1375-apmail-incubator-couchdb-dev-archive=incubator.apache.org@incubator.apache.org Thu Dec 04 06:48:52 2008 Return-Path: Delivered-To: apmail-incubator-couchdb-dev-archive@locus.apache.org Received: (qmail 6387 invoked from network); 4 Dec 2008 06:48:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 4 Dec 2008 06:48:52 -0000 Received: (qmail 47595 invoked by uid 500); 4 Dec 2008 06:49:04 -0000 Delivered-To: apmail-incubator-couchdb-dev-archive@incubator.apache.org Received: (qmail 47544 invoked by uid 500); 4 Dec 2008 06:49:03 -0000 Mailing-List: contact couchdb-dev-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: couchdb-dev@incubator.apache.org Delivered-To: mailing list couchdb-dev@incubator.apache.org Received: (qmail 47533 invoked by uid 99); 4 Dec 2008 06:49:03 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Dec 2008 22:49:03 -0800 X-ASF-Spam-Status: No, hits=1.2 required=10.0 tests=SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [83.97.50.139] (HELO jan.prima.de) (83.97.50.139) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 04 Dec 2008 06:47:34 +0000 Received: from [192.168.1.35] (g225032220.adsl.alicedsl.de [::ffff:92.225.32.220]) (AUTH: LOGIN jan, SSL: TLSv1/SSLv3,128bits,AES128-SHA) by jan.prima.de with esmtp; Thu, 04 Dec 2008 06:36:20 +0000 Message-Id: From: Jan Lehnardt To: couchdb-dev@incubator.apache.org In-Reply-To: Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v929.2) Subject: Re: 1.0.0 wishlist/roadmap Date: Thu, 4 Dec 2008 07:35:47 +0100 References: X-Mailer: Apple Mail (2.929.2) X-Virus-Checked: Checked by ClamAV on apache.org - Statistics: A module that collects runtime statistics (how many hits etc) and exports them to Futon and other tools for inspection and to SNMP for monitoring. Cheers Jan -- On 2 Dec 2008, at 20:34, Damien Katz wrote: > Here is some stuff I'd like to see in a 1.0.0 release. Everything is > open for discussion. > > - Built-in reduce functions to avoid unnecessary JS overhead - > > Count, Sum, Avg, Min, Max, Std dev. others? > > - Restrict database read access - > > Right now any user can read any database, we need to be able to > restrict that at least on a whole database level. > > - Replication performance enhancements - > > Adam Kocoloski has some replication patches that greatly improve > replication performance. > > - Revision stemming: It should be possible to limit the number of > revisions tracked - > > By default each document edit produces a revision id that is tracked > indefinitely. This guarantees conflicts versus subsequent edits can > always be distinguished in ad-hoc replication, however the forever > growing list of revisions isn't always desirable. THis can be > addressed by limiting the number tracked and purging the oldest > revisions. The downside is that if the revision tracking limited is > N, then anyone who hasn't replicated a document since its last N > edits will see a spurious edit conflict. > > - Lucene/Full-text indexing integration - > > We have this working to in side patches, this needs to be integrated > to trunk and with the view engine > > - Incremental document replication - > > We need at the minimum the ability to incrementally replicate only > the attachments that have changed in a document. This will save lots > of network IO and CouchDB can be version control system with > document diffs added as attachments. > > This can work for document fields too, but the overhead may not be > worth it. > > - Built-in authentication module(s) - > > The ability to host a CouchDB database used for HTTP authentication > schemes. If storing passwords, they would need to be stored > encrypted, decrypted on demand by the authentication process. > > - View server enhancements (stale/partial index option) - > > Chris Anderson has a side branch for this we need to finish and put > into trunk. > > - View index compaction - > > Views indexes expand forever, and need to be compacted in a similar > way the storage files are compacted. This work will tie into the > View Server enhancements. > > - Document integrity/deterministic revid - > > For the sake of end to end document integrity, we need a way to hash > a document's contents, and since we already have revision ids, I > think the revision ids should be the hashes. The hashed document > should be a canonical json representation, and it should have the > _id and _rev fields in it. The _rev will be the PREVIOUS revision ID/ > hash the edit is based on, or blank if a new edit. Then the _rev is > replaced with the new hash value. > > - Fully tail append writes - > > CouchDB uses zero-overwrite storage, but not fully tail append > storage. Document json bodies are stored in internal buffers, > written consecutively, one after another until the buffers in > completely full, then another buffer is created at the end of the > file for more documents. File attachments are written to similar > buffers as well. Btree updates are always tail append, each update > to a btree, even if its a deletion, causes new writes to the end of > the file. Once the document, attachments and indexes are commited > (fsync), the header is then written and flushed to disk, and that is > always stored right at the beginning of the file (requiring another > seek). > > Document updates to CouchDB require 2 fsyncs with ~3 seeks for full > committal and index consistency. This is true if you write 1 or 1000 > documents in a single transaction (bulk update), you still need ~ 3 > seeks. Using conventional transaction journalling, it's possible to > get the committal down to a single seek and fsync, and worry about > ensuring file and index consistency asynchronously, often in batch > mode with other committed updates. This can perform very well, but > has downsides like extra complexity and increased memory usage as > data is cached waiting to be flushed to disk, and must do special > consistency checks and fix-ups on startup if there is a crash. > > If CouchDB used tail-append storage for everything, then all > document updates can be completely flushed with full file > consistency with a single seek and, depending on the file system, a > single fsync. All the disk updates, documents, file attachments, > indexes and file header, occur as appends to the end of the file. > > The biggest changes will be in how file attachments and the headers > are written and read, and the performance characteristics of view > indexing as documents will no longer be packed into contiguous > buffers. > > File attachment will be written in chunks with the last chunk being > an index to the other chunks. > > Headers will be specially signed blocks written to the end of the > file. Reading the header on database open will require scanning the > file from the end, since the file might have partial updates that > didn't complete since the last update. > > The performance of the views will be impacted as the documents are > more likely to be fragmented across the storage file. But they will > still be in the order they will be accessed for indexing, so the > read seeks are always moving forward. Also, the act of compacting > the storage file will result in the documents being tightly packed > again. > > - Streaming document updates with attachment writes - > > Using mime mulitpart encoding, it should be possible to send all > parts of a document in a single http request, with the json and > binary attachments sent as different mime parts. Attachments can be > streamed to disk as bytes are received, keeping total memory > overhead to a minimum. Attachments can also be written to disk in > compressed format and served over http by default in that compressed > format, using 0% CPU for compression at read time, but will require > decompression if the client doesn't support the compression format. > > > - Partitioning/Clustering Support - > > Clustering for failover and load balancing is priority. Large > database support via partitioning may not make 1.0 > > > > >