Return-Path: Delivered-To: apmail-hadoop-general-archive@minotaur.apache.org Received: (qmail 17320 invoked from network); 14 Aug 2009 19:30:53 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 14 Aug 2009 19:30:53 -0000 Received: (qmail 45338 invoked by uid 500); 14 Aug 2009 19:30:59 -0000 Delivered-To: apmail-hadoop-general-archive@hadoop.apache.org Received: (qmail 45266 invoked by uid 500); 14 Aug 2009 19:30:59 -0000 Mailing-List: contact general-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: general@hadoop.apache.org Delivered-To: mailing list general@hadoop.apache.org Received: (qmail 45256 invoked by uid 99); 14 Aug 2009 19:30:59 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Aug 2009 19:30:59 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of eelco.hillenius@gmail.com designates 209.85.211.198 as permitted sender) Received: from [209.85.211.198] (HELO mail-yw0-f198.google.com) (209.85.211.198) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 14 Aug 2009 19:30:50 +0000 Received: by ywh36 with SMTP id 36so2725533ywh.31 for ; Fri, 14 Aug 2009 12:30:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:from:date:message-id :subject:to:content-type:content-transfer-encoding; bh=zNPl8VN7F7iUMqiYQQMi9MMhmFYjXFAlMzPi88UVTbk=; b=GqgrBsBzviaN8K3U0cLNrvx8a792ALeJTu594MGxFMpKTU0Go98NF1jXK0oNiATDJq gMCVgQb5J7tCZ4VZ3aZhLG9POsZDnpX2uPMLZ7lQlqWlUbH0ZBxzXU3o4C1Dhs1AojV/ LbFkcTyO/v7gBOzcg8DwUVbHmcKJ0X5jLqmIw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type :content-transfer-encoding; b=n6OOK0ivGQbFvmQY9imjmh6M/noOO8hx1YCkoW9JS3LEkKrVmzELyBsHD/V6hfT6cK HD3a6JVuGYrJ14FT37oj7hagADKyirHwSYeUTgHww/gVvTIgfxyDSqY8ee3xbdt24B9J b84bANSkYrzwF8J2rDtjvmIYNSPTusBbq+yss= MIME-Version: 1.0 Received: by 10.150.237.9 with SMTP id k9mr2872700ybh.108.1250278229188; Fri, 14 Aug 2009 12:30:29 -0700 (PDT) From: Eelco Hillenius Date: Fri, 14 Aug 2009 12:30:09 -0700 Message-ID: Subject: comparing sub/ 3rd party projects that abstract map/reduce To: general@hadoop.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Hi, Would people mind sharing their opinions about the relative strengths and weaknesses of Hive, Chukwa and Pig (and possibly other alternatives)? I'm only just getting into Hadoop, but it seems to me these libs have a considerable overlap depending on what you use them for? I plan to check those sub projects out anyway, and I'm definitively not trying to start a flame war here, but it would be great to hear some opinions about what areas these projects are particularly useful for and what might need some work etc. As a bit of context, we (Teachscape) are considering Hadoop for storing audit log files and extracting information from them. The audit logging we do is application specific, e.g. user Foo deleted Survey X, user Bar moved organization node AAB to AB, etc, and besides the need to run a couple of fixed reports weekly (mainly that give our customers some insight in how they are using our application), I expect us to want to create queries on the fly to e.g. track down problems. I don't expect our developers to have a problem writing Map/ Reduce programs, but I do like the idea of a higher level way of extracting information. Any thoughts would be greatly appreciated, Eelco