Return-Path: X-Original-To: apmail-hive-dev-archive@www.apache.org Delivered-To: apmail-hive-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5EEF5ECE2 for ; Wed, 27 Feb 2013 23:07:14 +0000 (UTC) Received: (qmail 9354 invoked by uid 500); 27 Feb 2013 23:07:13 -0000 Delivered-To: apmail-hive-dev-archive@hive.apache.org Received: (qmail 9300 invoked by uid 500); 27 Feb 2013 23:07:13 -0000 Mailing-List: contact dev-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hive.apache.org Delivered-To: mailing list dev@hive.apache.org Received: (qmail 9291 invoked by uid 500); 27 Feb 2013 23:07:13 -0000 Delivered-To: apmail-hadoop-hive-dev@hadoop.apache.org Received: (qmail 9288 invoked by uid 99); 27 Feb 2013 23:07:13 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Feb 2013 23:07:13 +0000 Date: Wed, 27 Feb 2013 23:07:13 +0000 (UTC) From: "Samuel Yuan (JIRA)" To: hive-dev@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HIVE-4044) Add URL type MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HIVE-4044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13588887#comment-13588887 ] Samuel Yuan commented on HIVE-4044: ----------------------------------- You're right, the idea is that it will enable better encoding of URLs. Kevin found that breaking up the URL into its components and storing them as separate columns results in significant space savings. The original plan was to implement this idea with RCFile, but with the new ORC file format I decided to wait for that instead, and to submit this part separately. However, it looks like the improvements of the ORC file have erased any gains we would have gotten by breaking up URLs into the individual components, so this won't be needed any more. > Add URL type > ------------ > > Key: HIVE-4044 > URL: https://issues.apache.org/jira/browse/HIVE-4044 > Project: Hive > Issue Type: Improvement > Reporter: Samuel Yuan > Assignee: Samuel Yuan > Attachments: HIVE-4044.HIVE-4044.HIVE-4044.D8799.1.patch > > > Having a separate type for URLs would enable improvements in storage efficiency based on breaking up a URL into its components. The new type will be named "URL" and made a non-reserved keyword (see HIVE-701). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira