Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7D8EE9053 for ; Sun, 25 Sep 2011 03:26:26 +0000 (UTC) Received: (qmail 97753 invoked by uid 500); 25 Sep 2011 03:26:26 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 96896 invoked by uid 500); 25 Sep 2011 03:26:20 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 96840 invoked by uid 99); 25 Sep 2011 03:26:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Sep 2011 03:26:17 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of longmans163@163.com designates 220.181.13.133 as permitted sender) Received: from [220.181.13.133] (HELO m13-133.163.com) (220.181.13.133) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 25 Sep 2011 03:26:12 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Received:Date:From:To:Message-ID:In-Reply-To: References:Subject:MIME-Version:Content-Type; bh=EywGAUfCfxG7Tp/ zCX1TgNCZilBjmHJxFLjepH7exjI=; b=bs5xLPE+J8bEoap80p3FYSj2akecYMb BIb8fNsD3sNgT54bKyRX4is8BGwGG51wYO2WAyByhUDDJ9LcwVw7UD6VwbpUeQET jg5pRURTCb/YrHoU4H/C8O/tGGfuyzR8yR6G2mi7Z6sO6j77weaXG9boSjqSZ8ay FIIEYtJDBA28= Received: from longmans163 ( [113.67.176.168] ) by ajax-webmail-wmsvr133 (Coremail) ; Sun, 25 Sep 2011 11:25:44 +0800 (CST) Date: Sun, 25 Sep 2011 11:25:44 +0800 (CST) From: longmans163 To: user@hive.apache.org Message-ID: <60b3a774.cef.1329e9df644.Coremail.longmans163@163.com> In-Reply-To: References: Subject: Re:How to load quote-separated fields? MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_11790_216998926.1316921144900" X-Originating-IP: [113.67.176.168] X-Priority: 3 X-Mailer: Coremail Webmail Server Version SP_ntes V3.5 build 110829(14649.4006.3992) Copyright (c) 2002-2011 www.mailtech.cn 163com X-CM-CTRLDATA: 05VrR2Zvb3Rlcl9odG09MjgyMTo4MQ== X-CM-TRANSID: hcGowGDZf0E5n35ONxIRAA--.5476W X-CM-SenderInfo: 5orqwzpdqvilqt6rljoofrz/xtbB0RXElkgYzM4sGQACsz X-Coremail-Antispam: 1U5529EdanIXcx71UUUUU7vcSsGvfC2KfnxnUU== ------=_Part_11790_216998926.1316921144900 Content-Type: text/plain; charset=GBK Content-Transfer-Encoding: 7bit hi, Mark, I saw ""GET /dynLink/?" contains a space, since hive should recognize this as a FIELDS TERMINATED which you have defined before. I think you should encode the spaces to other non-terminate char. At 2011-09-23 04:58:59,"Mark Kerzner" wrote: Hi, I have an apache web log (sample below), and want toLOAD DATA INPATH. My fields are separated by a space, and those that contains spaces are enclosed in quotes. I tried this, ROW FORMAT DELIMITED FIELDS TERMINATED BY " " COLLECTION ITEMS TERMINATED BY '"' MAP KEYS TERMINATED BY "," but it did not work, and thought that GET is a separate field. What should I change? Thank you, Mark [01/May/2011:00:00:00 +0000] 68.115.109.118 TLSv1 RC4-MD5 "GET /dynLink/?PCD=CHICHHH&EBC=3425154412&RCC=D2RVX&GAD=20110426&NMN=2&NOA=1&NOC=0&LNG=en&TBP=325.43&GEM=STEPHENCLAUDENELSON%40GMAIL.COM&GEN=&GSL=&GLN=NELSON&GFN=STEPHEN&GCC=&GST=&GCT=&GPC=&GAR=&GPN=&PRT=0&PLC=&PCC=brandwebsite&PSC=&SRP=CIBMS0&PID=HIL&PET=WEB&GNR=1&CRP=0901452 HTTP/1.1" 200 95 0 99885 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C; .NET4.0E; MS-RTC LM 8)" "https://secure.hilton.com/en/hi/res/retrieved_reservation.jhtml;jsessionid=UIBJ2MH0JDJPOCSGBIYMVCQ?_requestid=153483" "t=1304208000431979" "D=99766" ------=_Part_11790_216998926.1316921144900 Content-Type: text/html; charset=GBK Content-Transfer-Encoding: 7bit
hi, Mark, I saw ""GET /dynLink/?" contains a space, since hive should recognize this as a  FIELDS TERMINATED which you have defined before. I think you should encode the spaces to other non-terminate char.


At 2011-09-23 04:58:59,"Mark Kerzner" <mark.kerzner@shmsoft.com> wrote:
Hi,

I have an apache web log (sample below), and want to LOAD DATA INPATH.

My fields are separated by a space, and those that contains spaces are enclosed in quotes.

I tried this,

ROW FORMAT   DELIMITED
FIELDS TERMINATED BY " "
COLLECTION ITEMS TERMINATED BY '"'
MAP KEYS TERMINATED BY ","

but it did not work, and thought that GET is a separate field. What should I change?

Thank you,
Mark


[01/May/2011:00:00:00 +0000] 68.115.109.118 TLSv1 RC4-MD5 "GET /dynLink/?PCD=CHICHHH&EBC=3425154412&RCC=D2RVX&GAD=20110426&NMN=2&NOA=1& amp;NOC=0&LNG=en&TBP=325.43&GEM=STEPHENCLAUDENELSON%40GMAIL.COM&GEN=&GSL=&GLN=NELSON&GFN=STEPHEN&GCC=&GST=&GCT=&GPC=&GAR=&GPN=&PRT=0&PLC=&PCC=brandwebsite&PSC=&SRP=CIBMS0&PID=HIL&PET=WEB&GNR=1&CRP=0901452 HTTP/1.1" 200 95 0 99885 "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; InfoPath.2; .NET4.0C; .NET4.0E; MS-RTC LM 8)" "https://secure.hilton.com/en/hi/res/retrieved_reservation.jhtml;jsessionid=UIBJ2MH0JDJPOCSGBIYMVCQ?_requestid=153483"  "t=1304208000431979"  "D=99766"


------=_Part_11790_216998926.1316921144900--