incubator-couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rob Stewart <robstewar...@googlemail.com>
Subject Re: couchdb Comparison
Date Wed, 07 Oct 2009 22:26:51 GMT
Hi Randall,

thanks for getting in touch. I hope you're still able to contribute to
CouchDB in some way or another. Maybe the community just isn't yet ready to
commit itself to a de-facto, formalised way to distribute the execution of
CouchDB just yet? (lol)

I will keep you upto date with my progress, but I am certainly looking at my
project from a parallel distribution problem, as opposed to a DBMS exclusive
project, and I have a university cluster at my peril. But I'll keep you
updated.

@ Jesse - You confirmed some of my suspicions about CouchDB, with regards to
its mission, its scalability and its similarity to a distributed system such
as Hadoop. It is very useful to be aware of the explicit map-reduce nature
with respect to CouchDB, and is not something that will be overlooked in my
study for sure (Map-reduce has a vital role in Hadoop (it is the very core
of the distribution of processing/data)).  Perhaps, in a time not so far
away, there could be a study on the scalability and parallel performance on
CouchDB where CouchDB offers a developer these things for free ! (?)



Rob



2009/10/7 Jesse Hallett <hallettj@gmail.com>

> One issue is that Hadoop and CouchDB are very different tools.
>
> Hadoop is great at intensive, high-latency data analysis.  It doesn't
> matter
> how complicated the computation you want is - Hadoop will do it for you
> because it is a data processing engine.
>
> CouchDB is a database.  It is designed for low-latency, high-availability
> operations.  CouchDB is not a data processing engine, it is a data
> retrieval
> engine.  It should be faster than Hadoop for tasks that both systems can
> handle; and CouchDB can perform some powerful analysis via its map-reduce
> capability.  But the analysis you can perform with CouchDB will ultimately
> be limited by its low-latency design philosophy.
>
> What can be misleading is that while both Hadoop and CouchDB use
> map-reduce,
> they use it for very different things.  It is analogous to saying "these
> two
> programs both use iteration over tree structures."  One detail on choice of
> algorithm does not tell you what a program is designed for or what it is
> good at.
>
> CouchDB uses map-reduce to build pre-computed views of data.  The
> map-reduce
> pattern enforces data isolation which allows CouchDB to incrementally
> update
> views.  CouchDB does not (yet) take advantage of parallel processing when
> generating views.  Though you can get parallelism by distributing data over
> a cluster and splitting queries with a proxy.
>
> Hadoop uses map-reduce to run computation in parallel and to distribute
> computation across multiple machines.  The same data isolation that CouchDB
> relies on allows this.  But Hadoop takes advantage of that feature
> differently.
>
> On Oct 7, 2009 7:29 AM, "Göran Krampe" <goran@krampe.se> wrote:
>
> Nicholas Orr wrote: > > On Wed, Oct 7, 2009 at 11:53 PM, Rob Stewart > <
> robstewart57@googlemail.com>...
> Using an intermediate library in your language of choice you can get
> queries
> etc to look rather similar, take a look at this C# example program for
> using
> Divan:
>
> http://github.com/gokr/Divan/blob/master/samples/Trivial/Program.cs
>
> ...funny enough it also uses "Cars" as an example :). Note the LINQ
> integration which actually makes it possible to write:
>
> var fastCars = from c in linqCars where c.HorsePowers >= 175 select c;
>
> (given a view for it)
>
> regards, Göran
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message