Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 94561 invoked from network); 20 Sep 2010 20:31:37 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Sep 2010 20:31:37 -0000 Received: (qmail 25972 invoked by uid 500); 20 Sep 2010 20:31:36 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 25882 invoked by uid 500); 20 Sep 2010 20:31:35 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 25874 invoked by uid 99); 20 Sep 2010 20:31:35 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Sep 2010 20:31:35 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of randall.leeds@gmail.com designates 209.85.161.52 as permitted sender) Received: from [209.85.161.52] (HELO mail-fx0-f52.google.com) (209.85.161.52) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Sep 2010 20:31:30 +0000 Received: by fxm14 with SMTP id 14so425794fxm.11 for ; Mon, 20 Sep 2010 13:31:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=s44qSjR0OPbKyvx9/u25Fvr1K8VYEhWgyDlW1giMw60=; b=Xw+AhS7BhXNaNoDO1RvnneAZsvNgwW8k9BLU7Z9DNP9MOLp7ty9DRT+xkfWkj6MIIm p7EgMBo9d+ZvMBj4BIxR8xq2p8tUWuZpAzfHOKSAfROFBB+1PM9U1UFpfkcv0QmXvEyI ECd0p2yHoByiEa7VUVwmev1KRWCD4T32aCUoY= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=GbBy3Iqv0HvXt4Gm4NU/V/vY4W+W9pMQHQARBszbJa6zqi8MS166u3MEHKOrrhay1h NUHo0S4+Q5UhAfIk4CUUPoz/R7Lp1L7az3COMEx5ZkTH+pUiESaTbpIipqcfKLD+Mb7h onJcOhhSDcKj9kiUk7Rgi70VulJ7VOAFxKriY= MIME-Version: 1.0 Received: by 10.223.115.12 with SMTP id g12mr5050460faq.103.1285014668149; Mon, 20 Sep 2010 13:31:08 -0700 (PDT) Received: by 10.223.111.141 with HTTP; Mon, 20 Sep 2010 13:31:08 -0700 (PDT) In-Reply-To: References: Date: Mon, 20 Sep 2010 22:31:08 +0200 Message-ID: Subject: Re: distributed map-reduce views From: Randall Leeds To: user@couchdb.apache.org Content-Type: text/plain; charset=UTF-8 On Sun, Sep 19, 2010 at 20:37, Christopher Bare wrote: > Hi Couch-potatoes, > > I'm investigating using CouchDB for a data mining application and > could use some advice. Cool! Welcome to the party. > > What I have in mind is sharding a collection of documents between > several instances of CouchDB each running on their own nodes. Then, I > want to run distributed map-reduce queries over the whole collection > of documents. Do I understand correctly that Lounge is currently the > way to do this? Lounge is one way. BigCouch (just released) is another. > > How would doing something like this with CouchDB and Lounge compare > with using Hadoop and HBase? I do not know that much about HBase/Hadoop. I bet someone else on the list can add more differences, but I know at least there is a data model difference: CouchDB uses JSON documents but HBase is column oriented. Also, if HBase relies on HDFS then I think the HDFS name node is a single point of failure, whereas you can configure BigCouch and Lounge with redundancy at every level of the system. -Randall