couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daniel Friesen <li...@danielfriesen.name>
Subject Re: CouchDB x RDF databases comparison
Date Fri, 08 May 2009 07:09:13 GMT
There's also XML Databases (XQuery) (I'll just use X for simplicity) to 
compare. I ended up starting to use Sedna over at my work.

CouchDB uses JSON, X use XML
CouchDB uses views, X uses XQuery which has some simple indexing and has 
a significantly powerful and understandable query language
CouchDB has a lucene plugin, Sedna can have an extra fulltext index 
feature enabled.
Updating data in CouchDB requires an entire document be updated, X 
databases can modify small parts of the document
CouchDB saves a new document each change, X works on a current document.
CouchDB handles conflicts using conflict resolution, X makes the 
modification query on the current document in order of queries 
(transactions are also supported).
CouchDB uses a HTTP REST API, most X databases use a normal binary 
protocol (Sedna seams to have a good set of libraries for most languages)
CouchDB is distributed and scalable.
In X databases documents can be grouped into collections. (These can 
also be used in queries)
It's probably a moot point, but XQuery is w3c standardized and 
implemented by a number of databases.

IMHO compiling a comparison of alternative databases and seeing what 
features work best for what data you're working with is the best option.

I went through the semantic databases myself to cause our company had 
"Semantics" in mind. I had issues getting them to work and finding help 
for most of them myself and ended up finding that our data better fit 
the document based database type. For us TQL was the only actual one 
with a significant improvement (we really needed the walk capabilities) 
other than that Semantics were only a little better than a RDBMS 
(although we were actually using RDBMS in an ugly semantic like hack; 
atoms table 3 columns).
Our reason for moving away from RDBMS' was a need to remove the large 
amounts of queries going between our app and the database. We had a huge 
amount of hierarchical data the entire app was based around (a tree 
structure wasn't even guaranteed, something could have multiple parents 
referencing it and be part of multiple trees).
We decided on Sedna (XQuery) rather than CouchDB because CouchDB's views 
couldn't handle our hierarchical data in multiple documents, and we 
couldn't put everything in one document because of how we update small 
pieces of data a lot which doesn't work out well with how entire 
documents need to be modified in Couch (Transmitting entire document to 
modify a single value, new document revision saved each time, getting a 
conflict because an unrelated part of the document was modified).

Personally I have an idea for another type of database. The one thing 
I've always wanted was one program oriented. ie: Simplifying a database 
down to what it is, centralized data storage. Instead of a query 
language, embedding an existing programming language into the database 
environment. I wrote a bit of API drafting on it.

~Daniel Friesen (Dantman, Nadir-Seen-Fire)

Nitin Borwankar wrote:
> Demetrius Nunes wrote:
>> Hi Nitin,
>>
>> Great answer. Thanks a lot. One more question...
>>
>> I am in the Javaland here, so another viable option for my 
>> application is
>> using JCR, such as the Apache Jackrabbit implementation.
>>   
>
> Hi Demetrius,
>
> I am a refugee from Javaland so am familiar with the power and 
> limitations of Java.  Yes, I have looked at JCR and JackRabbit in a 
> previous project.
> These days I just recoil from the verbosity and conceptual layers you 
> encounter when coding simple things in Java.  And then there's XML.....
> So I would have held my nose and used JackRabbit if CouchDB didn't 
> exist - but in my mind it's a distant second in practice even if it is 
> conceptually similar and close in theory.
>
> Personally when I see layer upon layer of abstraction in Java 
> architecture diagrams I wonder how much of my CPU cost is going in 
> converting from strings, to TypeA to LayeredClassB to factoryC to ORM 
> D to EJB4 to disk and back again all the way to strings.  So I am 
> moving away from Java except when the best of breed solution is in 
> Java ( Lucene/Solr) - so I don't hate Java - I just need to justify 
> the overhead that it brings both in coding and in the 
> build/install/deploy process.
>
> CouchDB has minimal overhead in roundtrip datatype translations - it's 
> what I call "WYSIWIS"  - "what you see is what you store" i.e. JSON.
> There are people looking at an alternative to LAMP which they call JS3 
> - Javascript in all three layers - browser/helma/couchdb  ( helma, 
> helma.org, is a middle tier layer written in Java, runs on Jetty, uses 
> JS as the language for doing UI templates and also ORM ) - I 
> personally think CouchDB + CouchDBViews just makes it JS2 - 
> browser-CouchDB.
>
> I would suggest you download Rhino ( JS interpreter in Java) from 
> Mozilla and start playing with both CouchDB and JackRabbit and then see.
>
> Did I sound biased ? :-)
>
>
> Nitin Borwankar,
> Project Manager, Bibliographic Knowledge Network.
> bibkn.org
>
>> Did you happen to take a look at that as well? I think JCR has even more
>> similarities with CouchDB than RDF.
>>
>> How would you compare JCR and CouchDB ?
>>
>> Thanks a lot,
>> Demetrius
>>
>> On Thu, May 7, 2009 at 5:04 PM, Nitin Borwankar 
>> <nitin@borwankar.com> wrote:
>>
>>  
>>> Demetrius Nunes wrote:
>>>
>>>    
>>>> Hi there,
>>>>
>>>> We are evaluating new technologies for managing semi-structured 
>>>> data and
>>>> documents in one of our applications. We've got tired of wrestling
>>>> relational databases for this.
>>>>
>>>> I would like to know why would I prefer to use CouchDB instead of a 
>>>> RDF
>>>> database, such as Sesame ou Mulgara.
>>>>
>>>> I know some of the RDF advantages, such as open standards,
>>>> interoperability,
>>>> rules engines, semantic queries, community and tool support, maturity,
>>>> etc.
>>>>
>>>> But I really like the simplicity of the CouchDB model.
>>>>
>>>> Can anyone enlighten me?
>>>>
>>>> Thanks a lot,
>>>> Demetrius
>>>>
>>>>
>>>>
>>>>       
>>> Hi Demetrius,
>>>
>>> We ( bibkn.org) have investigated and used SQL databases, RDF store
>>> (Virtuoso) and CouchDB for bibliographic metadata management.  I am the
>>> project manager and data architect for this project.
>>> Relnl databases are a first choice often but have many limitations in
>>> management of loosely typed, messy, string based data sets.  So we 
>>> are in
>>> agreement on not using that technology.
>>>
>>> We, bibkn.org,  need both the schemalessness of CouchDB at one end 
>>> of our
>>> workflow and the strongly-typedness of RDF at the other end of the 
>>> workflow
>>> when all our data has been cleaned up and "ontologized". So we don't 
>>> see
>>> this as an either/or between CouchDB and RDF stores.
>>> However we can definitely say one thing  - if you need  just the 
>>> flexible
>>> schema aspect  and are using RDF to give you that, then  that is 
>>> massive
>>> overkill and the conceptual overhead of the RDF (ontology, schemas,
>>> namespaces, completely normalized everything ie URI's for subject,
>>> predictae, object) , is simply not worth it.    If however you want 
>>> to do
>>> logical inference and reasoning over your data then clearly the RDF and
>>> semantic  machinery gives you  a  whole lot of goodness that is 
>>> worth the
>>> overhead.
>>>
>>> So CouchDB is not a substitute for an RDF-store, but you may be 
>>> using an
>>> RDF-store for the lesser things it gives you (flexible schema) and 
>>> in that
>>> case CouchDB can do a lot more for you at a much lower overhead and 
>>> much
>>> greater ease of use and integration into existing tools.
>>>
>>> Additionally SPARQL  (like SQL) is not really meant for text search 
>>> which
>>> is critical for loosely typed data. So even at our RDF end we have a 
>>> Solr
>>> instance for rapid text search over the RDF store.
>>> Additionally we have couchdb-lucene as an extension on our CouchDB 
>>> instance
>>> and this has given us everything we need at the loosely typed data 
>>> end of
>>> our workflow.
>>>
>>> So if semi-structured data and document management is your primary 
>>> use case
>>> and there is no semantic/ontology/inference component then forget 
>>> RDF-stores
>>> and just go with CouchDB.
>>>
>>> In our project we are developing a format on top of JSON to export
>>> bibliographic metadata for integration into JSON friendly date 
>>> consumers, it
>>> also happens to have easy mapping to RDF.
>>> So even if you go to Couch now you may be able to integrate into an
>>> RDF-store at some later stage if the need arises.
>>>
>>> Hope this helps,
>>>
>>> Nitin Borwankar,
>>> Project Manager,  Bibliographic Knowledge Network
>>> bibkn.org


Mime
View raw message