Return-Path: Delivered-To: apmail-couchdb-user-archive@www.apache.org Received: (qmail 99336 invoked from network); 20 Sep 2010 20:52:47 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 20 Sep 2010 20:52:47 -0000 Received: (qmail 56607 invoked by uid 500); 20 Sep 2010 20:52:46 -0000 Delivered-To: apmail-couchdb-user-archive@couchdb.apache.org Received: (qmail 56554 invoked by uid 500); 20 Sep 2010 20:52:45 -0000 Mailing-List: contact user-help@couchdb.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@couchdb.apache.org Delivered-To: mailing list user@couchdb.apache.org Received: (qmail 56541 invoked by uid 99); 20 Sep 2010 20:52:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Sep 2010 20:52:45 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of paul.joseph.davis@gmail.com designates 209.85.161.180 as permitted sender) Received: from [209.85.161.180] (HELO mail-gx0-f180.google.com) (209.85.161.180) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Sep 2010 20:52:39 +0000 Received: by gxk4 with SMTP id 4so2549061gxk.11 for ; Mon, 20 Sep 2010 13:52:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:mime-version:received:in-reply-to :references:from:date:message-id:subject:to:content-type; bh=+x0jGInfYGDC9BQZBm/sKkttZ1tOzQl1pcWo7sY8Jww=; b=HFU92Sz5FyVRB6c9Gj6hemElTilGBrbvaSxWUm7E2Mm/BKrcgeZb5n7a+Rl5Th8Nab J/sXLNIKjNMaNMfXSz9SwSHVCj9EAxqw2kuSWpqVL1I3IKRq5fEz5piPVSDuH2Jr45P1 szxJc0Xx/xZPfJWH2PpA5KgyjAN6+rtt5Rcps= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; b=c92H6XzfUAMgOHDq6TBuO4Yjhf9kCLZtnpOAxmbzt0OcWIavRq8uAnukx86SnDj61M DyiPDzDalvxFemKWKCjGX8S4Xc4DhwS4zlpcZJ8nQwS8cLyWkhCKZ0ww5jOh7LPREoUc Hgvzy98Ktln821z9rztRPoezIsyXrEP98oDoU= Received: by 10.101.152.40 with SMTP id e40mr10018928ano.198.1285015938221; Mon, 20 Sep 2010 13:52:18 -0700 (PDT) MIME-Version: 1.0 Received: by 10.231.30.194 with HTTP; Mon, 20 Sep 2010 13:51:38 -0700 (PDT) In-Reply-To: References: From: Paul Davis Date: Mon, 20 Sep 2010 16:51:38 -0400 Message-ID: Subject: Re: distributed map-reduce views To: user@couchdb.apache.org Content-Type: text/plain; charset=ISO-8859-1 > How would doing something like this with CouchDB and Lounge compare > with using Hadoop and HBase? Remember that CouchDB and Hadoop serve different purposes. CouchDB is a data store, where as Hadoop is a data processing platform. While they both have "MapReduce" functionality they aren't quite the same thing. In CouchDB, when we use Map/Reduce, we create a single persistent index of data using map and reduce operators. These indexes can then be queried using single key or range lookups. Because of the properties of Map/Reduce we're capable of updating these indexes incrementally. Hadoop on the other hand is meant to handle arbitrary pipelines of data processing. Ie, users can configure Hadoop to run multiple stages of Map/Reduce in order to produce some desired output. The intermediate stages are not intended to be persistent and query-able. I'm not familiar enough to know how people use HBase in conjunction with Hadoop other than I believe its generally a data source. I don't know if it stores intermediate results or not. As far as I know, Hadoop doesn't provide incremental indexing. As Randal points out, there are various differences in implementation, but its also important to understand the data store vs. data processing differences of the two systems. HTH, Paul Davis