couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Newson <robert.new...@gmail.com>
Subject Re: distributed map-reduce views
Date Mon, 20 Sep 2010 21:01:15 GMT
A nice explanation. I've never quite known how to respond to people
that, when I discuss CouchDB with them, say "why not use Hadoop?".
Admittedly it's mostly because I'm trying to hold back a biting
comment, since there's really no commonality besides the use of
(distinct variants of) the Map/Reduce (family of) algorithm(s).

B.

On Mon, Sep 20, 2010 at 9:51 PM, Paul Davis <paul.joseph.davis@gmail.com> wrote:
>> How would doing something like this with CouchDB and Lounge compare
>> with using Hadoop and HBase?
>
> Remember that CouchDB and Hadoop serve different purposes. CouchDB is
> a data store, where as Hadoop is a data processing platform. While
> they both have "MapReduce" functionality they aren't quite the same
> thing.
>
> In CouchDB, when we use Map/Reduce, we create a single persistent
> index of data using map and reduce operators. These indexes can then
> be queried using single key or range lookups. Because of the
> properties of Map/Reduce we're capable of updating these indexes
> incrementally.
>
> Hadoop on the other hand is meant to handle arbitrary pipelines of
> data processing. Ie, users can configure Hadoop to run multiple stages
> of Map/Reduce in order to produce some desired output. The
> intermediate stages are not intended to be persistent and query-able.
> I'm not familiar enough to know how people use HBase in conjunction
> with Hadoop other than I believe its generally a data source. I don't
> know if it stores intermediate results or not. As far as I know,
> Hadoop doesn't provide incremental indexing.
>
> As Randal points out, there are various differences in implementation,
> but its also important to understand the data store vs. data
> processing differences of the two systems.
>
> HTH,
> Paul Davis
>

Mime
View raw message