couchdb-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jim R. Wilson" <>
Subject Re: CouchDB and Hadoop
Date Mon, 19 Apr 2010 16:46:41 GMT
Hi Steve,

Clarification points:  Hadoop is not a filesystem, it's an
implementation of MapReduce.  For sharing files around the cluster,
Hadoop uses HDFS (the Hadoop Distributed File System) by default, and
can also use other filesystems (I believe it supports Amazon S3
storage if the cluster is in EC2).

So, the questions become:

* Could the data file for a CouchDB database be stored in HDFS?
* Could the MapReduce tasks executed by CouchDB be offloadad to Hadoop?

I think the answers to both are "probably".  How much work it would
take to implement such a system is an open question.  I suspect
storing the data file on HDFS would be easier than offloading the
mapreduce tasks.

As far as handling the Java to Erlang/JavaScript mismatch, I think
that particular piece can be addressed by using Hadoop Streaming[1].
I have done a fair amount of work using Python to work on JSON objects
over Hadoop Streaming - Erlang/JavaScript should be no different.

The real question in my mind is, "why do any of this?".  Both Hadoop
and CouchDB are fine systems with particular goals in mind.  I'm not
convinced there's significant value in Frankensteining them together.

Just my $0.02


-- Jim R. Wilson (jimbojw)

On Fri, Apr 16, 2010 at 4:12 AM, Suhail Ahmed <> wrote:
> Sure It can be done but for me the whole Java to Erlang layer would be a
> mess since they are so different. The better way to go about doing this
> would to be implement a distributed file system like Hadoop underneath Couch
> for same effect.
> On Fri, Apr 16, 2010 at 1:16 AM, Steve-Mustafa Ismail Mustafa <
>> wrote:
>> I swear, I spent over an hour going through the mailing list trying to find
>> an answer.
>> I know that CouchDB is a document oriented DB and I know that Hadoop is a
>> File System and that both implement Map/Reduce.  But is it possible to have
>> them stacked with Hadoop being the FS in use and CouchDB being the DB? This
>> way, wouldn't you get the distributed/clustered FS abilities of Hadoop in
>> addition to the powerful retrieval abilities of CouchDB?
>> If its not possible, and I suspect that it is so, _why_? Don't they operate
>> on two seperate levels? Wouldn't CouchDB sort of replace HBase?
>> Thanks in advance for any and all replies

View raw message