Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 1892A9E1C for ; Tue, 13 Mar 2012 17:04:02 +0000 (UTC) Received: (qmail 88076 invoked by uid 500); 13 Mar 2012 17:04:01 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 88035 invoked by uid 500); 13 Mar 2012 17:04:01 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 88027 invoked by uid 99); 13 Mar 2012 17:04:01 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Mar 2012 17:04:01 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of kwiley@keithwiley.com designates 67.18.59.5 as permitted sender) Received: from [67.18.59.5] (HELO gateway11.websitewelcome.com) (67.18.59.5) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Mar 2012 17:03:52 +0000 Received: by gateway11.websitewelcome.com (Postfix, from userid 5011) id 8A6868908EDD; Tue, 13 Mar 2012 12:03:29 -0500 (CDT) Received: from gator542.hostgator.com (gator542.hostgator.com [74.54.187.114]) by gateway11.websitewelcome.com (Postfix) with ESMTP id 792A18908EB1 for ; Tue, 13 Mar 2012 12:03:29 -0500 (CDT) Received: from [24.19.6.8] (port=45002 helo=[192.168.10.2]) by gator542.hostgator.com with esmtpsa (TLSv1:AES128-SHA:128) (Exim 4.69) (envelope-from ) id 1S7V8C-0008EB-Lz for user@hive.apache.org; Tue, 13 Mar 2012 12:03:28 -0500 From: Keith Wiley Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Subject: How to import extremely "wide" csv tables Date: Tue, 13 Mar 2012 10:03:28 -0700 Message-Id: To: user@hive.apache.org Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - gator542.hostgator.com X-AntiAbuse: Original Domain - hive.apache.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - keithwiley.com X-BWhitelist: no X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: c-24-19-6-8.hsd1.wa.comcast.net ([192.168.10.2]) [24.19.6.8]:45002 X-Source-Auth: kwiley+keithwiley.com X-Email-Count: 2 X-Source-Cap: a2J3aWxleTtrYndpbGV5O2dhdG9yNTQyLmhvc3RnYXRvci5jb20= X-Virus-Checked: Checked by ClamAV on apache.org Wrapping hive around existing csv files consists of manually naming and = typing every column during the creation command. I have several csv = tables and some of them have a ton of columns. I would love a way to = create hive tables which automatically infers the column types by = attempting various type conversions or regex matches on the data (say = the first row). What would be even cooler is if the first row could = actually be interpreted differently from the rest of the table...as a = set of string labels to name the columns while the types could be = automatically inferred from, say, the *second* row. These csv files are = currently of this format, with the first row naming the columns. Does this make sense? Now, I'm sure that hive doesn't support this yet -- and I admit it is a = somewhat esoteric desire on my part -- but I'm curious how others would = suggest approaching it? I'm thinking of writing a separate isolated = program that reads the first two rows of a csv file and dumps a text = string of column names and types in the correct syntax for a hive = external table creation statement which I would then copy/paste into = hive...I was just hoping for a simpler solution. Thoughts? Thanks. = __________________________________________________________________________= ______ Keith Wiley kwiley@keithwiley.com keithwiley.com = music.keithwiley.com "You can scratch an itch, but you can't itch a scratch. Furthermore, an = itch can itch but a scratch can't scratch. Finally, a scratch can itch, but an = itch can't scratch. All together this implies: He scratched the itch from the = scratch that itched but would never itch the scratch from the itch that scratched." -- Keith Wiley = __________________________________________________________________________= ______