Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 4407 invoked from network); 19 Apr 2010 16:47:13 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 19 Apr 2010 16:47:13 -0000 Received: (qmail 88408 invoked by uid 500); 19 Apr 2010 16:47:11 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 88321 invoked by uid 500); 19 Apr 2010 16:47:11 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 88312 invoked by uid 99); 19 Apr 2010 16:47:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Apr 2010 16:47:11 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of wilson.jim.r@gmail.com designates 74.125.82.180 as permitted sender) Received: from [74.125.82.180] (HELO mail-wy0-f180.google.com) (74.125.82.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 19 Apr 2010 16:47:03 +0000 Received: by wyf22 with SMTP id 22so2662386wyf.11 for ; Mon, 19 Apr 2010 09:46:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:in-reply-to:references :date:received:message-id:subject:from:to:content-type :content-transfer-encoding; bh=aAnkKE+OLSpJ8Wsjhm3KC0XNkM5wsGvMT1ForAGsgxc=; b=heLSu0WNkFTdJrz+yfGZY6b6HlQMT4IWMHSAEPRmajNu4osc4KeSPTek3kKmeFy25s zAwWw6QBNrNRCj3emmeGS6h3GzaZOdtOGKxlodC9qWw3pnCgLJYdRFfMfFnDdIQG5ong +TeJzii+RLGVfmRd0MM7fp17jpGvwmnQoYrQ8= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; b=Rvw9l3MeZgigJ1BjjufpQbGePybYz07r7bDHbduSCANO3ytzQBsHaRRJFGaBAYEhfA 71aNtLr7UQ6DG0j3Opw+AlvaW+WzPwUJLzvNR/4LEOGVI5TocucBgyBm/e13r19CgSnE T1ZVYeoTUadnT6Uzur7izJvCSR3g2hDGWngPI= MIME-Version: 1.0 Received: by 10.216.27.72 with HTTP; Mon, 19 Apr 2010 09:46:41 -0700 (PDT) In-Reply-To: References: <4BC7AC4B.1050401@gmail.com> Date: Mon, 19 Apr 2010 12:46:41 -0400 Received: by 10.216.91.6 with SMTP id g6mr572451wef.37.1271695603025; Mon, 19 Apr 2010 09:46:43 -0700 (PDT) Message-ID: Subject: Re: CouchDB and Hadoop From: "Jim R. Wilson" To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi Steve, Clarification points: Hadoop is not a filesystem, it's an implementation of MapReduce. For sharing files around the cluster, Hadoop uses HDFS (the Hadoop Distributed File System) by default, and can also use other filesystems (I believe it supports Amazon S3 storage if the cluster is in EC2). So, the questions become: * Could the data file for a CouchDB database be stored in HDFS? * Could the MapReduce tasks executed by CouchDB be offloadad to Hadoop? I think the answers to both are "probably". How much work it would take to implement such a system is an open question. I suspect storing the data file on HDFS would be easier than offloading the mapreduce tasks. As far as handling the Java to Erlang/JavaScript mismatch, I think that particular piece can be addressed by using Hadoop Streaming[1]. I have done a fair amount of work using Python to work on JSON objects over Hadoop Streaming - Erlang/JavaScript should be no different. The real question in my mind is, "why do any of this?". Both Hadoop and CouchDB are fine systems with particular goals in mind. I'm not convinced there's significant value in Frankensteining them together. Just my $0.02 [1] http://hadoop.apache.org/common/docs/r0.15.2/streaming.html -- Jim R. Wilson (jimbojw) On Fri, Apr 16, 2010 at 4:12 AM, Suhail Ahmed wrote: > Sure It can be done but for me the whole Java to Erlang layer would be a > mess since they are so different. The better way to go about doing this > would to be implement a distributed file system like Hadoop underneath Co= uch > for same effect. > > On Fri, Apr 16, 2010 at 1:16 AM, Steve-Mustafa Ismail Mustafa < > m.i.mustafa@gmail.com> wrote: > >> I swear, I spent over an hour going through the mailing list trying to f= ind >> an answer. >> >> I know that CouchDB is a document oriented DB and I know that Hadoop is = a >> File System and that both implement Map/Reduce. =A0But is it possible to= have >> them stacked with Hadoop being the FS in use and CouchDB being the DB? T= his >> way, wouldn't you get the distributed/clustered FS abilities of Hadoop i= n >> addition to the powerful retrieval abilities of CouchDB? >> >> If its not possible, and I suspect that it is so, _why_? Don't they oper= ate >> on two seperate levels? Wouldn't CouchDB sort of replace HBase? >> >> Thanks in advance for any and all replies >> >