Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 64934 invoked from network); 16 Mar 2010 10:17:43 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 16 Mar 2010 10:17:43 -0000 Received: (qmail 15431 invoked by uid 500); 16 Mar 2010 10:17:42 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 15279 invoked by uid 500); 16 Mar 2010 10:17:42 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 15271 invoked by uid 99); 16 Mar 2010 10:17:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Mar 2010 10:17:42 +0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests=FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of matteo.caprari@gmail.com designates 74.125.82.172 as permitted sender) Received: from [74.125.82.172] (HELO mail-wy0-f172.google.com) (74.125.82.172) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 16 Mar 2010 10:17:34 +0000 Received: by wyb29 with SMTP id 29so1911019wyb.31 for ; Tue, 16 Mar 2010 03:17:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:from:date:message-id :subject:to:content-type; bh=MhdoyZ8HtvZs3fLVYkVW+GshQ+HWGRgpD+6bYco69YM=; b=tP/MGEPS3JWpCUpanc0zh2mP5pIVmV9sLAXlYayW/uckODEOb2VTxNz1oDvSjuSQTO lIn3vgGv6SZCIVlI4vjQXGwOk7bbRtdJ/l3ZtX8gWyP6kemtC4wakYmyI6ynO/1fGDO1 YREWSSS7yTy+jzvYzbebeSc8+bvW6Gk7crCEI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; b=lbpn6Yt97ONnRONndP2UCdTRyOVhp8Yram9+ZKcRkbqK7x3RjXZXFVHKZmTxuVk3U9 dDary2YTgQxG+u3fOZeppLUT3VFFkeLirqsyXXGmu6/6TuKuQdKEUpYERz3XwyXYf2Uc 8efvClFKb8fBMLNTHedEthE42iNJoYId3lBXo= MIME-Version: 1.0 Received: by 10.216.87.68 with SMTP id x46mr1281629wee.145.1268734634133; Tue, 16 Mar 2010 03:17:14 -0700 (PDT) From: Matteo Caprari Date: Tue, 16 Mar 2010 10:16:54 +0000 Message-ID: <1bca98391003160316j4c190bd0j61475d26f58dfc55@mail.gmail.com> Subject: Cassandra and hadoop? To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org Hi. I've tried the mapreduce example in 0.6 contrib/wordcount and it worked very well. I have a shallow understanding of both worlds, so pardon my questions: Is the integration with hadoop just 'semantic' (ie map/reduce api is only used as query abstraction) or is it 'structural' (ie cassandra can 'talk to hadoop' and replace HDFS as input source)? In practice: - If I want to run a distributed mapreduce job on cassandra, does my cassandra cluster have to be an hadoop cluster as well? - do I get data locality optimization: I reckon cassandra can in principle figure out where it is best to execute a SlicePredicate/Mapper, but to do so it should take over some of the responsibilities of hadoop's jobtracker. Does it? Thanks. -- :Matteo Caprari matteo.caprari@gmail.com