couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Anderson" <jch...@apache.org>
Subject Re: Evaluating CouchDB
Date Tue, 25 Nov 2008 18:53:16 GMT
On Mon, Nov 24, 2008 at 9:24 AM, Peter Herndon <tpherndon@gmail.com> wrote:

>
> Anyway, that's my current use case, and my next use case.  I know that
> CouchDB isn't finished yet, and hasn't been optimized yet, but does
> anyone have any opinions on whether CouchDB would be a reasonable fit
> for managing the metadata associated with each object?

I think CouchDB is pretty much design with this use case in mind. If
you were lucky enough to convince the organization to switch from XML
to JSON, the software would pretty much write itself. And CouchDB does
a fairly decent job of dealing in XML, as well (using Spidermonkey's
E4X engine) so that's not even required.

> And, likewise,
> would CouchDB be a reasonable fit for managing the binary datastreams?
>  Would it be practical to store the datastreams in CouchDB itself, and
> up to what size limit/throughput limit?

CouchDB's attachment support is pretty much designed for this use case
(attachments can be multi-GB files, and aren't sent to view servers).
>From your description, it sounds like you are maxing out IO at the
network level, so it's hard to say how CouchDB would interact with
such a stream, without seeing it in action. However, CouchDB's
replication and distribution capabilities should make managing
multi-site projects as simple as one can hope for. If you shard
projects as databases, then you can use replication to make them
available on the local network for the various sites, which should
make it easier to avoid load bottlenecks at a central repository.

> Would it be better to store
> the datastreams externally and use CouchDB to manage the metadata and
> access control?

It's not clear - obviously importing TBs of data from a filesystem to
CouchDB will take time and expense, even if CouchDB handles it
swimmingly. The nice thing about the schemaless documents is that you
can be flexible going forward, maybe referencing some assets via URIs
and storing others as attachments.

Also, looking down the road, are there plans for
> CouchDB's development that would improve its fitness for this purpose
> in the future?
>

Your project sounds like a good fit for CouchDB. Of course, you are
talking about working on the high end of the performance / scalability
curve, and CouchDB is relatively new, so you'll have to be comfortable
as a trail-blazer (not that you'd be the only one, but with a new
technology, you'll be in a smaller crowd than if you used something
that's been around longer.)

I think the biggest positive reason to use CouchDB for your project is
the easy of federation / distribution / offline work. Once you've
built the business-rules and document format around your project and
CouchDB, booting up other instances of the project for more media
collections should be straightforward. Because the documents will be
more self-contained that what you'd have with a SQL store, for
instance, you could build something amenable to merging multiple
repositories, or splitting off just a portion of a repository for a
particular purpose. This flexibility seems like a big win, as it would
allow you to respond to things like datacenter-level bottlenecks with
changes that users will understand, such as moving just the necessary
sub-collections to a more local server.

Good luck and keep us up to date with your progress.

Chris

-- 
Chris Anderson
http://jchris.mfdz.com

Mime
View raw message