Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 32847 invoked from network); 7 May 2009 20:05:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 7 May 2009 20:05:03 -0000 Received: (qmail 6453 invoked by uid 500); 7 May 2009 20:05:02 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 6408 invoked by uid 500); 7 May 2009 20:05:01 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 6398 invoked by uid 99); 7 May 2009 20:05:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 May 2009 20:05:01 +0000 X-ASF-Spam-Status: No, hits=-1.0 required=10.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [66.111.4.28] (HELO out4.smtp.messagingengine.com) (66.111.4.28) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 07 May 2009 20:04:51 +0000 Received: from compute2.internal (compute2.internal [10.202.2.42]) by out1.messagingengine.com (Postfix) with ESMTP id AB1D43415B8; Thu, 7 May 2009 16:04:29 -0400 (EDT) Received: from heartbeat2.messagingengine.com ([10.202.2.161]) by compute2.internal (MEProxy); Thu, 07 May 2009 16:04:29 -0400 X-Sasl-enc: mKRWb9AZd/GzRcNW31nSSVWXe4fhXTBr1KpD3Xidc+fN 1241726669 Received: from NitinBorwankarsComputer.local (c-24-130-241-83.hsd1.ca.comcast.net [24.130.241.83]) by mail.messagingengine.com (Postfix) with ESMTPA id F2F6D3BEF1 for ; Thu, 7 May 2009 16:04:28 -0400 (EDT) Message-ID: <4A033ECC.3070206@borwankar.com> Date: Thu, 07 May 2009 13:04:28 -0700 From: Nitin Borwankar User-Agent: Thunderbird 2.0.0.21 (Macintosh/20090302) MIME-Version: 1.0 To: user@couchdb.apache.org Subject: Re: CouchDB x RDF databases comparison References: <4aa4f4d60905071215r577e0715wcc971199e69164a8@mail.gmail.com> In-Reply-To: <4aa4f4d60905071215r577e0715wcc971199e69164a8@mail.gmail.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Demetrius Nunes wrote: > Hi there, > > We are evaluating new technologies for managing semi-structured data and > documents in one of our applications. We've got tired of wrestling > relational databases for this. > > I would like to know why would I prefer to use CouchDB instead of a RDF > database, such as Sesame ou Mulgara. > > I know some of the RDF advantages, such as open standards, interoperability, > rules engines, semantic queries, community and tool support, maturity, etc. > > But I really like the simplicity of the CouchDB model. > > Can anyone enlighten me? > > Thanks a lot, > Demetrius > > Hi Demetrius, We ( bibkn.org) have investigated and used SQL databases, RDF store (Virtuoso) and CouchDB for bibliographic metadata management. I am the project manager and data architect for this project. Relnl databases are a first choice often but have many limitations in management of loosely typed, messy, string based data sets. So we are in agreement on not using that technology. We, bibkn.org, need both the schemalessness of CouchDB at one end of our workflow and the strongly-typedness of RDF at the other end of the workflow when all our data has been cleaned up and "ontologized". So we don't see this as an either/or between CouchDB and RDF stores. However we can definitely say one thing - if you need just the flexible schema aspect and are using RDF to give you that, then that is massive overkill and the conceptual overhead of the RDF (ontology, schemas, namespaces, completely normalized everything ie URI's for subject, predictae, object) , is simply not worth it. If however you want to do logical inference and reasoning over your data then clearly the RDF and semantic machinery gives you a whole lot of goodness that is worth the overhead. So CouchDB is not a substitute for an RDF-store, but you may be using an RDF-store for the lesser things it gives you (flexible schema) and in that case CouchDB can do a lot more for you at a much lower overhead and much greater ease of use and integration into existing tools. Additionally SPARQL (like SQL) is not really meant for text search which is critical for loosely typed data. So even at our RDF end we have a Solr instance for rapid text search over the RDF store. Additionally we have couchdb-lucene as an extension on our CouchDB instance and this has given us everything we need at the loosely typed data end of our workflow. So if semi-structured data and document management is your primary use case and there is no semantic/ontology/inference component then forget RDF-stores and just go with CouchDB. In our project we are developing a format on top of JSON to export bibliographic metadata for integration into JSON friendly date consumers, it also happens to have easy mapping to RDF. So even if you go to Couch now you may be able to integrate into an RDF-store at some later stage if the need arises. Hope this helps, Nitin Borwankar, Project Manager, Bibliographic Knowledge Network bibkn.org