incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Borwankar <ni...@borwankar.com>
Subject Re: CouchDB x RDF databases comparison
Date Thu, 07 May 2009 20:55:16 GMT
Demetrius Nunes wrote:
> Hi Nitin,
>
> Great answer. Thanks a lot. One more question...
>
> I am in the Javaland here, so another viable option for my application is
> using JCR, such as the Apache Jackrabbit implementation.
>   

Hi Demetrius,

I am a refugee from Javaland so am familiar with the power and 
limitations of Java.  Yes, I have looked at JCR and JackRabbit in a 
previous project.
These days I just recoil from the verbosity and conceptual layers you 
encounter when coding simple things in Java.  And then there's XML.....
So I would have held my nose and used JackRabbit if CouchDB didn't exist 
- but in my mind it's a distant second in practice even if it is 
conceptually similar and close in theory.

Personally when I see layer upon layer of abstraction in Java 
architecture diagrams I wonder how much of my CPU cost is going in 
converting from strings, to TypeA to LayeredClassB to factoryC to ORM D 
to EJB4 to disk and back again all the way to strings.  So I am moving 
away from Java except when the best of breed solution is in Java ( 
Lucene/Solr) - so I don't hate Java - I just need to justify the 
overhead that it brings both in coding and in the build/install/deploy 
process.

CouchDB has minimal overhead in roundtrip datatype translations - it's 
what I call "WYSIWIS"  - "what you see is what you store" i.e. JSON.
There are people looking at an alternative to LAMP which they call JS3 - 
Javascript in all three layers - browser/helma/couchdb  ( helma, 
helma.org, is a middle tier layer written in Java, runs on Jetty, uses 
JS as the language for doing UI templates and also ORM ) - I personally 
think CouchDB + CouchDBViews just makes it JS2 - browser-CouchDB.

I would suggest you download Rhino ( JS interpreter in Java) from 
Mozilla and start playing with both CouchDB and JackRabbit and then see.

Did I sound biased ? :-)


Nitin Borwankar,
Project Manager, Bibliographic Knowledge Network.
bibkn.org
 
> Did you happen to take a look at that as well? I think JCR has even more
> similarities with CouchDB than RDF.
>
> How would you compare JCR and CouchDB ?
>
> Thanks a lot,
> Demetrius
>
> On Thu, May 7, 2009 at 5:04 PM, Nitin Borwankar <nitin@borwankar.com> wrote:
>
>   
>> Demetrius Nunes wrote:
>>
>>     
>>> Hi there,
>>>
>>> We are evaluating new technologies for managing semi-structured data and
>>> documents in one of our applications. We've got tired of wrestling
>>> relational databases for this.
>>>
>>> I would like to know why would I prefer to use CouchDB instead of a RDF
>>> database, such as Sesame ou Mulgara.
>>>
>>> I know some of the RDF advantages, such as open standards,
>>> interoperability,
>>> rules engines, semantic queries, community and tool support, maturity,
>>> etc.
>>>
>>> But I really like the simplicity of the CouchDB model.
>>>
>>> Can anyone enlighten me?
>>>
>>> Thanks a lot,
>>> Demetrius
>>>
>>>
>>>
>>>       
>> Hi Demetrius,
>>
>> We ( bibkn.org) have investigated and used SQL databases, RDF store
>> (Virtuoso) and CouchDB for bibliographic metadata management.  I am the
>> project manager and data architect for this project.
>> Relnl databases are a first choice often but have many limitations in
>> management of loosely typed, messy, string based data sets.  So we are in
>> agreement on not using that technology.
>>
>> We, bibkn.org,  need both the schemalessness of CouchDB at one end of our
>> workflow and the strongly-typedness of RDF at the other end of the workflow
>> when all our data has been cleaned up and "ontologized". So we don't see
>> this as an either/or between CouchDB and RDF stores.
>> However we can definitely say one thing  - if you need  just the flexible
>> schema aspect  and are using RDF to give you that, then  that is massive
>> overkill and the conceptual overhead of the RDF (ontology, schemas,
>> namespaces, completely normalized everything ie URI's for subject,
>> predictae, object) , is simply not worth it.    If however you want to do
>> logical inference and reasoning over your data then clearly the RDF and
>> semantic  machinery gives you  a  whole lot of goodness that is worth the
>> overhead.
>>
>> So CouchDB is not a substitute for an RDF-store, but you may be using an
>> RDF-store for the lesser things it gives you (flexible schema) and in that
>> case CouchDB can do a lot more for you at a much lower overhead and much
>> greater ease of use and integration into existing tools.
>>
>> Additionally SPARQL  (like SQL) is not really meant for text search which
>> is critical for loosely typed data. So even at our RDF end we have a Solr
>> instance for rapid text search over the RDF store.
>> Additionally we have couchdb-lucene as an extension on our CouchDB instance
>> and this has given us everything we need at the loosely typed data end of
>> our workflow.
>>
>> So if semi-structured data and document management is your primary use case
>> and there is no semantic/ontology/inference component then forget RDF-stores
>> and just go with CouchDB.
>>
>> In our project we are developing a format on top of JSON to export
>> bibliographic metadata for integration into JSON friendly date consumers, it
>> also happens to have easy mapping to RDF.
>> So even if you go to Couch now you may be able to integrate into an
>> RDF-store at some later stage if the need arises.
>>
>> Hope this helps,
>>
>> Nitin Borwankar,
>> Project Manager,  Bibliographic Knowledge Network
>> bibkn.org
>>
>>
>>
>>
>>
>>     
>
>
>   


Mime
View raw message