Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 36D91E86A for ; Mon, 11 Mar 2013 04:05:44 +0000 (UTC) Received: (qmail 40986 invoked by uid 500); 11 Mar 2013 04:05:42 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 40927 invoked by uid 500); 11 Mar 2013 04:05:42 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 40916 invoked by uid 99); 11 Mar 2013 04:05:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Mar 2013 04:05:42 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ablozhou@gmail.com designates 209.85.215.42 as permitted sender) Received: from [209.85.215.42] (HELO mail-la0-f42.google.com) (209.85.215.42) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 11 Mar 2013 04:05:34 +0000 Received: by mail-la0-f42.google.com with SMTP id fe20so3447457lab.1 for ; Sun, 10 Mar 2013 21:05:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:from:date:message-id:subject:to :content-type; bh=GapXEjbLjKOayZhCmamxur+AfWrN5+eTLLyCr+e1Pbw=; b=Eh1pDfzZJOQ4OyAyfklu8QFoQASAl5WcF6Ypmycf9m9rc9MmuA/raUCQAQYYMXJE0z zbg+uoCDDJYAc5jlRTd9K96qJltgiojhCWH8kQJDHdqBXoHkXlTp+XRb6N821pOU2LPF h2R8pYybhqeiOeY32FPScUSOR2qPnuT3B8kLg4/AQzZbZHQFOdC2jCLSLJteh8EDx9YN jkxisGDwuGARnTL6VIKLIzQlmpUN/TW4O4Pr74V+5XPT1eo6WErendAD0ItqnDnkQOcl PcdTUU40ZvibBx0lcuFJf8bDCp4mh+evEV4yg6ZPJYvNYJH9tp4xuLTJUlyUYd0q82T9 v/iQ== X-Received: by 10.152.46.17 with SMTP id r17mr8903950lam.47.1362974713692; Sun, 10 Mar 2013 21:05:13 -0700 (PDT) MIME-Version: 1.0 Received: by 10.112.25.230 with HTTP; Sun, 10 Mar 2013 21:04:53 -0700 (PDT) From: =?UTF-8?B?5ZGo5qKm5oOz?= Date: Mon, 11 Mar 2013 12:04:53 +0800 Message-ID: Subject: how to handle variable format data of text file? To: user@hive.apache.org Content-Type: multipart/alternative; boundary=bcaec5524106a5c8ac04d79e47a0 X-Virus-Checked: Checked by ClamAV on apache.org --bcaec5524106a5c8ac04d79e47a0 Content-Type: text/plain; charset=ISO-8859-1 I have files like this: 03/11/13 10:59:52 00000ec0 1009 180538126 92041 2300 0 0 7 21|47|20|33|11 0:2775 03/11/13 10:59:52 00000744 1010 178343610 92042 350 1 0 -1 NULL NULL 22 45 the format is separated by blank space: date time threadid gid userid [variable formated data grouped by fields separated by space ] I'd like to create a table like: hive> create external table handresult (hdate string,htime string, thid string, gid int, userid string,ldata string) row format delimited fields terminated by " "; OK but the above table will only have a part of the data. select * from handresult; 03/11/13 10:59:52 00000ec0 1009 180538126 92041 03/11/13 10:59:52 00000744 1010 178343610 92042 the remain data like "2300 0 0 7 21|47|20|33|11 0:2775 " I can't get. while ldata may be variance length and format separated by " " or an array, the ldata we will parse diferent by each gid. how do this? Thanks, Andy Zhou --bcaec5524106a5c8ac04d79e47a0 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable I have files like this:
03/11/13 10:59:52 00000ec0 1009 180538126 = 92041 2300 0 0 7 21|47|20|33|11 0:2775
03/11/13 10:59:52 00000744= 1010 178343610 92042 350 1 0 -1 NULL NULL 22 45
the format= is separated by blank space:
date time threadid gid userid [variable formated data grouped by field= s separated by space ]

I'd like to create a ta= ble like:

hive> create external table hand= result (hdate string,htime string, thid string, gid int, userid string,ldat= a string) row format delimited fields terminated by =C2=A0" ";
OK

but the above table will only have a= part of the data.
select *=E3=80=80from handresult;
03= /11/13 10:59:52 00000ec0 1009 180538126 92041
03/11/13 10:59:52 0= 0000744 1010 178343610 92042

the remain data =C2=A0like "2300 0 0 7 21|47|20|33= |11 0:2775 " =C2=A0I can't get.

while lda= ta may be variance length and format separated by " " or an array= , the ldata we will parse diferent =C2=A0by each gid.

how do this?

Thanks,
Andy Zhou
--bcaec5524106a5c8ac04d79e47a0--