Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@couchdb.apache.org
Received-SPF: pass (athena.apache.org: local policy)
Message-ID: <4A033ECC.3070206@borwankar.com>
Date: Thu, 07 May 2009 13:04:28 -0700
From: Nitin Borwankar <nitin@borwankar.com>
User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302)
MIME-Version: 1.0
To: user@couchdb.apache.org
Subject: Re: CouchDB x RDF databases comparison
References: <4aa4f4d60905071215r577e0715wcc971199e69164a8@mail.gmail.com>
In-Reply-To: <4aa4f4d60905071215r577e0715wcc971199e69164a8@mail.gmail.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Demetrius Nunes wrote:
> Hi there,
>
> We are evaluating new technologies for managing semi-structured data and
> documents in one of our applications. We've got tired of wrestling
> relational databases for this.
>
> I would like to know why would I prefer to use CouchDB instead of a RDF
> database, such as Sesame ou Mulgara.
>
> I know some of the RDF advantages, such as open standards, interoperability,
> rules engines, semantic queries, community and tool support, maturity, etc.
>
> But I really like the simplicity of the CouchDB model.
>
> Can anyone enlighten me?
>
> Thanks a lot,
> Demetrius
>
>   
Hi Demetrius,

We ( bibkn.org) have investigated and used SQL databases, RDF store 
(Virtuoso) and CouchDB for bibliographic metadata management.  I am the 
project manager and data architect for this project.
Relnl databases are a first choice often but have many limitations in 
management of loosely typed, messy, string based data sets.  So we are 
in agreement on not using that technology.

We, bibkn.org,  need both the schemalessness of CouchDB at one end of 
our workflow and the strongly-typedness of RDF at the other end of the 
workflow when all our data has been cleaned up and "ontologized". So we 
don't see this as an either/or between CouchDB and RDF stores. 

However we can definitely say one thing  - if you need  just the 
flexible schema aspect  and are using RDF to give you that, then  that 
is massive overkill and the conceptual overhead of the RDF 
(ontology, schemas, namespaces, completely normalized everything ie 
URI's for subject, predictae, object) , is simply not worth it.    If 
however you want to do logical inference and reasoning over your data 
then clearly the RDF and semantic  machinery gives you  a  whole lot of 
goodness that is worth the overhead.

So CouchDB is not a substitute for an RDF-store, but you may be using an 
RDF-store for the lesser things it gives you (flexible schema) and in 
that case CouchDB can do a lot more for you at a much lower overhead and 
much greater ease of use and integration into existing tools.

Additionally SPARQL  (like SQL) is not really meant for text search 
which is critical for loosely typed data. So even at our RDF end we have 
a Solr instance for rapid text search over the RDF store.
Additionally we have couchdb-lucene as an extension on our CouchDB 
instance and this has given us everything we need at the loosely typed 
data end of our workflow.

So if semi-structured data and document management is your primary use 
case and there is no semantic/ontology/inference component then forget 
RDF-stores and just go with CouchDB.

In our project we are developing a format on top of JSON to export 
bibliographic metadata for integration into JSON friendly date 
consumers, it also happens to have easy mapping to RDF.
So even if you go to Couch now you may be able to integrate into an 
RDF-store at some later stage if the need arises.

Hope this helps,

Nitin Borwankar,
Project Manager,  Bibliographic Knowledge Network
bibkn.org