Return-Path: Delivered-To: apmail-avro-user-archive@www.apache.org Received: (qmail 12047 invoked from network); 28 Oct 2010 18:43:42 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 28 Oct 2010 18:43:42 -0000 Received: (qmail 57672 invoked by uid 500); 28 Oct 2010 18:43:42 -0000 Delivered-To: apmail-avro-user-archive@avro.apache.org Received: (qmail 57614 invoked by uid 500); 28 Oct 2010 18:43:41 -0000 Mailing-List: contact user-help@avro.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@avro.apache.org Delivered-To: mailing list user@avro.apache.org Received: (qmail 57606 invoked by uid 99); 28 Oct 2010 18:43:41 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Oct 2010 18:43:41 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [205.234.18.191] (HELO zimbra.prxy.net) (205.234.18.191) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 28 Oct 2010 18:43:33 +0000 Received: from localhost (localhost.localdomain [127.0.0.1]) by zimbra.prxy.net (Postfix) with ESMTP id 57EB78ED847E for ; Thu, 28 Oct 2010 11:43:12 -0700 (PDT) X-Virus-Scanned: amavisd-new at zimbra.prxy.net Received: from zimbra.prxy.net ([127.0.0.1]) by localhost (zimbra.prxy.net [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id SnLkKNBsy1EC for ; Thu, 28 Oct 2010 11:43:12 -0700 (PDT) Received: from [192.168.1.106] (adsl-75-26-208-105.dsl.scrm01.sbcglobal.net [75.26.208.105]) by zimbra.prxy.net (Postfix) with ESMTPSA id 247C98ED8440 for ; Thu, 28 Oct 2010 11:43:12 -0700 (PDT) Message-Id: <4147726A-4FDC-4A69-AD19-37DD8D869FEA@transpac.com> From: Ken Krugler To: user@avro.apache.org Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Mime-Version: 1.0 (Apple Message framework v936) Subject: Avro and Hive Date: Thu, 28 Oct 2010 11:43:11 -0700 X-Mailer: Apple Mail (2.936) X-Virus-Checked: Checked by ClamAV on apache.org Hi all, I'd seen past emails from Scott and Doug about using Avro as the data format for Hive. This was back in April/May, and I'm wondering about current state of the world. Specifically, what's the recommended approach (& known issues) with using Avro files with Hive? E.g. Scott mentioned that "Avro files should be better performing and more compact than sequence files." Has that been proven out? He also discussed a minor issue with maps - "Their maps however can have any intrinsic type as a key (int, long, string, float, double)." And a more serious issue with unions, though this wouldn't directly impact us as we wouldn't be using that feature. In our situation, we're trying to get the best of both worlds by leveraging Hive for analytics, and Cascading for workflow, so having one store in HDFS for both would be a significant win. Thanks for any input! -- Ken -------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g