couchdb-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Anderson <jch...@apache.org>
Subject Re: The 1.0 Thread
Date Fri, 19 Jun 2009 20:11:42 GMT
On Thu, Jun 18, 2009 at 2:34 PM, Damien Katz<damien@apache.org> wrote:
> Okay, time to ask the question, what features do we need to get to 1.0?
>
> I'm going to list my must haves, and my nice to haves.
>
> Must have:
> - Document integrity checking: Using some sort of hashing scheme for end to
> end integrity checking of documents and attachments. Reusing the revision ID
> as the hash of the document might work, and has the benefit of allowing
> writing the same changes to 2 different servers and not causing a conflict.
> Also multiple clients can write the same change to a document and not get
> unnecessary conflicts.
> - Reader/Writer access databases and servers: Allow/disallow anonymous,
> users, groups.
> - Continuous replication: Keeping a constant connection and being able to
> replicate changes as soon as they happen.

Agree on the above listed ones.

> - Better testing: We need really some performance and stress testing as part
> of the source. And we need much better code coverage in general with the
> testing.

I think better testing can be a nice-to-have. I think we're on the
right track here and I don't ever feel held back by lack of tests. A
little Makefile integration so we have a more complete pre-commit
ritual would be the easiest way to add value here.

>
>
> Nice to have:
> - Hashing/CRC everything written to disk, data, metadata, index structures,
> etc. But optional, since many filesystems actively integrity-check disk
> data.

I think the document integrity checks will handle the part that really
matters, and the rest of this should fall on the OS. I'm not saying
CouchDB shouldn't eventually pick this up, but I'm comfortable having
this be post-1.0, especially as it doesn't effect the API.

> - Better full text integration: Out of the box integration and the ability
> intersect results with views, for easier result formatting. Lucene would be
> the primary FT engine, but we make it pluggable, much like the view engines
> are.

This would be nice but I think we need someone to really sponsor the
development. There are a lot of people using existing tools, maybe
some of them should start a thread discussing the good and bad parts
of their experience.

> - Attachment level replication: By tracking the revision when an attachment
> was modified, the replicator can avoid copying unchanged attachments to the
> target. The same can apply to json fields, but it's much less of a win
> there.

Is this going to fall out of deterministic revs already (document
integrity checking)?

> - Partitioning/sharding support: Ideally would be nice to have something
> that "just works" without a lot of setup.

I think CouchDB-Lounge 100% fills this void for now. Maybe we should
do more to link it to the project, even if that's just documentation.
Down the line we'll want to port CouchDB Lounge to Erlang and
integrate it, or perhaps even better would be to use Cliff Moon's
Dynomite project, but I see these as post-1.0 since users who need
medium sized clusters can get that now with Lounge, and large clusters
will take longer than we want to wait for 1.0 to implement and
production test.

> - Built-in authentication: A plug-in that authenticates HTTP users and
> assign them roles. It would use a couch database as a directory that
> contains users documents, etc.

I'd like to see this too, not sure if it should be a 1.0 blocker.
Depends on what others say.


> - Selective replication: The ability to replicate a subset of documents,
> using a javascript function as a selector.
> - Server side doc processing: The ability to POST data and have arbitrary
> server-side processing. The simplest case is posting a document to a Js
> handler that can do some data cleanup and add default values the document
> before saving it. But ideally would be able to interact with the full
> database

Personally I'd bump these two to the first list, but that's not really
an issue as I have a feeling they are gonna happen regardless. The
selective replication is a little easier as the API is pretty clear,
the server side update handler is more complex (especially if it
includes interacting with the full database), but either way I'm
motivated to help these patches along, and we're pretty close.

> - Scheduled replication: The ability to schedule replication every so often,
> like a cron job. But this can be done with an actual cron job and CURL, so
> it's not critical to have it built-in.

Yawn. If CouchDB is gonna grow it's own cron, that should come after 1.0 imho.


>
> And anyone who wants to take on any of these issues: mine, yours or anyone
> else's, just do it. Read code, mail dev@ with questions and advice, write
> some code, repeat.
>

Thanks for putting the list together, Damien. I agree about Nathan's
Windows requests, but I'm not a user.

The only other thing that jumps out at me is:

- An application (design doc) URL rewrite handler. We've discussed
this on other threads, I think the ideas are there we just need to
refine an implementation. I'd be fine seeing this come after 1.0, but
if it's ready it should go in.

Chris



-- 
Chris Anderson
http://jchrisa.net
http://couch.io

Mime
View raw message