Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3C8F117633 for ; Mon, 4 May 2015 15:34:45 +0000 (UTC) Received: (qmail 16235 invoked by uid 500); 4 May 2015 15:34:39 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 16142 invoked by uid 500); 4 May 2015 15:34:39 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 16132 invoked by uid 99); 4 May 2015 15:34:39 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 May 2015 15:34:39 +0000 X-ASF-Spam-Status: No, hits=2.5 required=5.0 tests=HTML_FONT_FACE_BAD,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: message received from 54.76.25.247 which is an MX secondary for user@hadoop.apache.org) Received: from [54.76.25.247] (HELO mx1-eu-west.apache.org) (54.76.25.247) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 May 2015 15:34:11 +0000 Received: from userp1040.oracle.com (userp1040.oracle.com [156.151.31.81]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id C3C2A25015 for ; Mon, 4 May 2015 15:34:09 +0000 (UTC) Received: from aserv0021.oracle.com (aserv0021.oracle.com [141.146.126.233]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id t44FX7Oq015913 (version=TLSv1 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK) for ; Mon, 4 May 2015 15:33:07 GMT Received: from aserv0121.oracle.com (aserv0121.oracle.com [141.146.126.235]) by aserv0021.oracle.com (8.13.8/8.13.8) with ESMTP id t44FX7Ig031754 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL) for ; Mon, 4 May 2015 15:33:07 GMT Received: from abhmp0001.oracle.com (abhmp0001.oracle.com [141.146.116.7]) by aserv0121.oracle.com (8.13.8/8.13.8) with ESMTP id t44FX7rY027672 for ; Mon, 4 May 2015 15:33:07 GMT Received: from [10.149.251.27] (/10.149.251.27) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 04 May 2015 08:33:06 -0700 Message-ID: <55479131.1050600@oracle.com> Date: Mon, 04 May 2015 11:33:05 -0400 From: gabriel balan Organization: Oracle Corporation User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:31.0) Gecko/20100101 Thunderbird/31.1.2 MIME-Version: 1.0 To: user@hadoop.apache.org Subject: Re: parque table References: In-Reply-To: Content-Type: multipart/alternative; boundary="------------060309090506060409030405" X-Source-IP: aserv0021.oracle.com [141.146.126.233] X-Virus-Checked: Checked by ClamAV on apache.org This is a multi-part message in MIME format. --------------060309090506060409030405 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Hi If you're quoted fields may contain commas, you must use RegexSerDe to parse each line into fields. create table foo(c0 string, c1 string, c2 string, c3 string, c4 string, c5 string, c6 string, c7 string) row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe' with serdeproperties ("input.regex" = "^([^,]*),\"([^\"]*)\",([^,]*),([^,]*),\"([^\"]*)\",\"([^\"]*)\",\"([^\"]*)\",\"([^\"]*)\"$"); --here I assumed some fields are always quoted, and some fields are always unquoted. You may need something fancier for the general case. load DATA local inpath 'log.txt.gz' into table foo; select * from foo; OK 106 2003-02-03 20 2 A 2 2 037 106 2003-02-03 20 3 A 2 2 037 106 2003-02-03 8 2 A 2 2 037 If you're sure there are no commas in your quoted fields, then you could try putting a view on top of the table, and have the view use UDFs to strip the quotes. hth Gabriel Balan On 5/2/2015 1:04 AM, Kumar Jayapal wrote:6 > Hi, > > When I am loading this data I am getting " " inserted into the table how to load with out " "it. > > > Inline image 1 > > > > thanks > jay > > > > > > > > > > > Thanks > Jay > > On Fri, May 1, 2015 at 8:21 AM, Hadoop User > wrote: > > Here is the content of the file once it's unzip > > 106,"2003-02-03",20,2,"A","2","2","037" > 106,"2003-02-03",20,3,"A","2","2","037" > 106,"2003-02-03",8,2,"A","2","2","037" > > > > > > On May 1, 2015, at 7:32 AM, Asit Parija > wrote: > >> Hi Kumar , >> You can remove the stored as text file part and then try that out by default it should be able to read the .gz files ( if they are comma delimited csv files ) . >> >> >> Thanks >> Asit >> >> On Fri, May 1, 2015 at 10:55 AM, Kumar Jayapal > wrote: >> >> Hello Nitin, >> >> Dint understand what you mean. Are you telling me to set COMPRESSION_CODEC=gzip ? >> >> thanks >> Jay >> >> On Thu, Apr 30, 2015 at 10:02 PM, Nitin Pawar > wrote: >> >> You loaded a gz file in a table stored as text file >> either define compression format or uncompress the file and load it >> >> On Fri, May 1, 2015 at 9:17 AM, Kumar Jayapal > wrote: >> >> Created table CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR smallint, FISCAL_PERIOD smallint) >> STORED AS TEXTFILE; >> >> and loaded it with data. >> >> |LOAD DATA LOCAL INPATH ||'/tmp/weblogs/20090603-access.log.gz'||INTO TABLE raw;| >> | >> | >> |I have to load it to parque table| >> | >> | >> |when I say select * from raw it shows all null values.| >> | >> | >> | >> >> NULL NULL NULL NULL NULL NULL NULL NULL >> >> NULL NULL NULL NULL NULL NULL NULL NULL >> >> NULL NULL NULL NULL NULL NULL NULL NULL >> >> NULL NULL NULL NULL NULL NULL NULL NULL >> >> | >> Why is not show showing the actual data in file. will it show once I load it to parque table? >> >> Please let me know if I am doing anything wrong. >> >> >> Thanks >> jay >> | >> | >> >> >> >> >> -- >> Nitin Pawar >> >> >> > -- The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation. --------------060309090506060409030405 Content-Type: multipart/related; boundary="------------020302020808080606030609" --------------020302020808080606030609 Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: 8bit Hi

If you're quoted fields may contain commas, you must use RegexSerDe to parse each line into fields.
create table foo(c0 string, c1 string, c2 string,  c3 string,  c4 string,  c5 string,  c6 string,  c7 string)
row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
with serdeproperties
("input.regex" = "^([^,]*),\"([^\"]*)\",([^,]*),([^,]*),\"([^\"]*)\",\"([^\"]*)\",\"([^\"]*)\",\"([^\"]*)\"$");


--here I assumed some fields are always quoted, and some fields are always unquoted. You may need something fancier for the general case.

load DATA local inpath 'log.txt.gz' into table foo;

select * from foo;
OK
106     2003-02-03      20      2       A       2       2       037
106     2003-02-03      20      3       A       2       2       037
106     2003-02-03      8       2       A       2       2       037
If you're sure there are no commas in your quoted fields, then you could try putting a view on top of the table, and have the view use UDFs to strip the quotes.


hth
Gabriel Balan

On 5/2/2015 1:04 AM, Kumar Jayapal wrote:6
Hi,

When I am loading this data I am getting " "  inserted into the table how to load with out " "it.


Inline image 1



thanks
jay










Thanks
Jay

On Fri, May 1, 2015 at 8:21 AM, Hadoop User <kjayapal17@gmail.com> wrote:
Here is the content of the file once it's unzip

106,"2003-02-03",20,2,"A","2","2","037"
106,"2003-02-03",20,3,"A","2","2","037"
106,"2003-02-03",8,2,"A","2","2","037"





On May 1, 2015, at 7:32 AM, Asit Parija <asit@sigmoidanalytics.com> wrote:

Hi Kumar ,
  You can remove the stored as text file part and then try that out by default it should be able to read the .gz files ( if they are comma delimited csv files ) .


Thanks
Asit

On Fri, May 1, 2015 at 10:55 AM, Kumar Jayapal <kjayapal17@gmail.com> wrote:
Hello Nitin,

Dint understand what you mean. Are you telling me to  set COMPRESSION_CODEC=gzip ?

thanks
Jay

On Thu, Apr 30, 2015 at 10:02 PM, Nitin Pawar <nitinpawar432@gmail.com> wrote:
You loaded a gz file in a table stored as text file
either define compression format or uncompress the file and load it 

On Fri, May 1, 2015 at 9:17 AM, Kumar Jayapal <kjayapal17@gmail.com> wrote:
Created table  CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR  smallint, FISCAL_PERIOD smallint)
STORED AS TEXTFILE;

and loaded it with data.

LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE raw;

I have to load it to parque table

when I say select * from raw it shows all null values.


NULL NULL NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL NULL NULL

NULL NULL NULL NULL NULL NULL NULL NULL
Why is not show showing the actual data in file. will it show once I load it to parque table?

Please let me know if I am doing anything wrong.


Thanks
jay 




--
Nitin Pawar




-- 
The statements and opinions expressed here are my own and do not necessarily represent those of Oracle Corporation.
--------------020302020808080606030609 Content-Type: image/png Content-Transfer-Encoding: base64 Content-ID: iVBORw0KGgoAAAANSUhEUgAAA6YAAAA9CAYAAAC+y3T2AAAQ80lEQVR4Ae3dQUsb3R7H8V8v fRmGVH0TXmkfCMSV69LSQhER7Ka7gosHEZG7ENx1U0FEBEvFtSuFQFtS34S2kvCsnlfQjZeZ ycSZ6ZkzkxrnnInfeynPJJk5///5/CeZOZkz8dHNzc2N+B8CCCCAAAIIIIAAAggggAACjgT+ 4yguYRFAAAEEEEAAAQQQQAABBBAIBRiYsiMggAACCCCAAAIIIIAAAgg4FXicjP7PP/8kH7KM AAIIIIAAAggggAACCCCAwL0LPEreY/rr1697D0gABBBAAAEEEEAAAQQQQAABBJICTOVNarCM AAIIIIAAAggggAACCCBQuQAD08rJCYgAAggggAACCCCAAAIIIJAUYGCa1GAZAQQQQAABBBBA AAEEEECgcgEGppWTExABBBBAAAEEEEAAAQQQQCApwMA0qcEyAggggAACCCCAAAIIIIBA5QIM TCsnJyACCCCAAAIIIIAAAggggEBSgIFpUoNlBBBAAAEEEEAAAQQQQACBygUYmFZOTkAEEEAA AQQQQAABBBBAAIGkQMmBaV8n7xa1uBj829FFsoWqlvsnevdbbFteydcW9e6kX1WmxMkTuNjR 4rsT9RXVZifekcLaxvvXoobPh+0k65jd9yyvBbHC/TVo953s5be0Y80t21FLO3n55Jlkm3b5 2Gpg6bPLnIntTsC6v7hLi8geCBR93oWfk4bP66LtPOjaxKaQZ5/3Ps9bf2KBatCxvJpQQ7+L l1e3wTm0cUwWbGM69009f3u+HY6NcuO44Sk1ML3YWdVBc12np6dab3W0lR45VJD5hXZWD3SV iWTLK3xNS9o9PdXp7pJ0sF0wOMk0zsOKBKLaNtdPw/3rdL2lztbtiUlhjU37ZfBhu3Wtpd24 zaYOVrOD2tvu5cew53bbQrSU286I+WTbdfvYbpDbZ7dJE92ZgH1/cZYWgWsg0NfJUUetVlMH n+NvLWuQ9oNMkfd5/ctODetaw9zzLtu55tz76Bw7GBMF/9Zbklp6/XzKO4YSA9MLfevMaOnl XJj83MslzXS+VXbVtH/yTouLW7putTST4rPlNXht7blC8qnn+nD6QR76p3o08Q+mmpppNjSl KTWaM2oGxbn4ps7Mkga7lzT3UkszV7oOL3CXqLFpv8zWe+6pWrpWz3jR3BLDmlu2WpZ2bPmY TLJNu3xsNbD02WXOxHYnYN1f3KVFZE8EbJ93/e/6ctXS05fN388xbNt50rWJTcNkb3ufm9af WJyadMxUE2rof/FMdZPlvMt2rpnq7YV2tjpqrb9XOLIzxkltUOmDx4XR+j1dq6mn8aB6qqGm voQn+XPxc4WN3GGFxmvtns5pKvgmoHN925AtL2Vyvt2KJZcCwZvmfZTA3PsPgzfEe51G33kM MuvrOr40PkqNLftl/+RInZm/tGvaX20xgm+Y8nLLOtraycRN52Mwybbt8rHNYIQ+u+wCsSsU sO0vFaZBKE8FTMeAQar971901Xqtuak5qbWoo5OXmou/TbZs52lPJyctk/2U5dhoWn9yNOrZ E1NNqKH/tTTVbYTzrvS55m13o+eXtBuf35ri3K5e+VKJK6aV55QKODU3F131TD1b8KB/rauZ phRebY3mUlc++7ggRV42C8RvmOEVVPNq5Z4NvsxYXNTqgbQUXz0vt6VxrTvnNuZ8jEne85N3 Nrjn/GjeLwH2F7/q4W82F/ocfE7HM2CetnT15buMk1z87cSDzYz3ef1LTw3rX8NhD6znmn19 /3Kl1uvBjNLhRv4seD8w/WOqqwMdaW04lzp53+Ift8mG9yoQTNtePWhq/cOY3jDBt0DhXHr7 PaZlOjWW3MaYT5mcx73OWAzGnRTteSvA/uJtafxLLJxW+Jf+G88uCW6/uDoQt5r6V6psRrzP syL1e0wN61cza8a2c834lon4aqm1ITcvFk/ldZPX3aPOLGktngYU3re4Gt23GB/47h6BFsYo EH0wSku7gznvY2xb4T2mR+H086nvweB3MFe4ta7Tl8WBTLlFz43WzjBSIp9KpsMPA//5gsng z1tjy0kXYH+Z9AqPs3/Rjx4Fv264uniQbvjbhd7PeXwGlc72wT3ifV7/klPD+tcwtweGc83h LRO5G7l/oXhgmr13Lzu/2VUfrHk1NXN1HU4DYhzqqkDl40YfjE2tn2YGpdYaZ+51LrlfTj3/ oNPn6dxS90xn2snLbdR20hHr9SjPQNb61KuPZDs+gdz9ZXwhaGmSBAbf4P/2+R9MR1s90snL OX640MN68z73sCgjpkQNRwTzYfU7nnf1r680E/7yqA+dMedQYirvnJ62roY/337x+UBXrafR D9eY26zoWUteU8/1utXRUfzHKy8+6yD4tT++eK2oNiOE6Z9oO7i3yHil1FJjWV4L/15T4s/D BIPNmcQ0sVR6lnasuaUakcaWT7Zdx4+tBhY7x2kT3pGAdX9xlBNhvRbIPacIj+NX+vKdO029 KyDvc+9KMnJC1HBkMj82sJx3FZ779tW7lpoNvy/ZPbq5ubmJsX/9+hUvZv57oZ3FLXXCZ1u/ X9nKrH0vD8NvT6/1OnVVzZZXXyfvVnU7a/NU7xmY3ktp7tLoxc6itqIdK9VMaz2ul63G+a9F 3wTGP+9btM+a2ynOLZVy8Ldvct8no+WTbdfd42KD/D67y5rIrgSK9xdXmRHXT4Hg8+NIzd2c P+cWnGhtyc05h59gXmTF+9yLMtwpCWp4Jz7HG+efd9nPNQs+bx33Kg5fcmAar85/EUAAAQQQ QAABBBBAAAEEEBivQImpvOMNSGsIIIAAAggggAACCCCAAAIIJAUYmCY1WEYAAQQQQAABBBBA AAEEEKhcgIFp5eQERAABBBBAAAEEEEAAAQQQSAowME1qsIwAAggggAACCCCAAAIIIFC5AAPT yskJiAACCCCAAAIIIIAAAgggkBRgYJrUYBkBBBBAAAEEEEAAAQQQQKByAQamlZMTEAEEEEAA AQQQQAABBBBAICnAwDSpwTICCCCAAAIIIIAAAggggEDlAo+TEf/999/kQ5YRQAABBBBAAAEE EEAAAQQQuHeBRzc3Nzf3HoUACCCAAAIIIIAAAggggAACCOQIMJU3B4anEUAAAQQQQAABBBBA AAEEqhFgYFqNM1EQQAABBBBAAAEEEEAAAQRyBBiY5sDwNAIIIIAAAggggAACCCCAQDUCDEyr cSYKAggggAACCCCAAAIIIIBAjgAD0xwYnkYAAQQQQAABBBBAAAEEEKhGgIFpNc5EQQABBBBA AAEEEEAAAQQQyBFgYJoDw9MIIIAAAggggAACCCCAAALVCDAwrcaZKAgggAACCCCAAAIIIIAA AjkCow1Me8d6u7Ctbk5j9/q0MXZPx28XtLAQ/MvmlXxtQW+Pe/eaHo2XEOhua+HtsXqKarMd 70hhbeM6Lmj4fNhkso62GmdeC2KF+0XQ7lvZy2+JYc0t22dLO3n55Jlkm3b52Gpg6bPLnInt TiBvX3eXEZF9ESj6vAv3HcPnddF2vvRvEvPIs887LuStP4k2delTXk2ood8VzKvb4BzaOPYJ tjGd+6aevz3fDsdGuXHc8IwwMO1qe3lPl07yNMfubi9rb3pTZ2dn2myfayMxoglf04r2z850 tr8i7f2vYHDipGMEVVTb6c2zsI5nm22db9yemBTW2FT/4MN244dW9uM2p7W3nBm4JuTzY9hz SzQRLua2M2I+2XbdPrYb5PbZbdJEdyVQ633dFRpxI4Gejg/P1W5Pa+9T/K0lNn4K2I8LfuZM VmkBapj2qM+j3PMu2/F3fi06xw7GRMG/zbaktt68aHjX8VID097xWy0sbOhHu63ZiruQH7ur r+ezWnk1H2Y0/2pFs+dfB1dzB6/9/UIheeOFPp59lIf+FWs6Dtd4otnphsL/T8/qSVCc7led z65oUEZp/pVWZi/1M7zAXaLGpvpn6z3/TG39UM940dwSw5pb1tLSji0fk0m2aZePrQaWPrvM mdjuBGz7urusiOyLgO3zrtdV57KtZ6+eJI7lg8Rt2/nSt0nNw2RvOy6Y1p9Um7r0y1QTauh/ 9Ux1k+W8q/Txt6vtjXO1N9cUjqCMcdzxPC4VuvFG+2fzagSj8fOfpTYZ20p5sXs9/dC0nsWD /UZD0+oMBh+Z18aWDA3dSSB406xFLcyvfYwWGms6i75bGDTd08/4svwoNU7WP94n4haPD3U+ 29J+5vnwZVuM4BumvNwGbQ//Y2snE7eXysdgMmzUgwWbwQh99qAnpOBAIL2vO0iAkH4JmI4B gwx73Y4u228035iX2gs6PH6l+fjbZMt2fnVwArMx2duO26b1J5ClVl0y1YQa+l9CU91GOO/K O/5Gz69oPz6/NcVxqFPqimljfj668ugg0T+K3fupy9knUnilN5pLnZjl66AXhCwrEL9hhldQ y25oWi/4ImVhQct70kp89dy0Xsnn7pzbmPMpmfZYV7uzwVizoTFvBSZgX/fWdiIT6+pT8Dkd z4B51tZlpyvjJJeJ7H+9O8Vxod71C7KnhvWv4bAH1uNvT93OpdpvBjNKhxv5s1BqYOpPuiNk crmnQ/09nEudvG9xhFZYtUKBYNr28t60Nj+O6Q0TfAsUzqW332NapotjyW2M+ZTJedzrjMVg 3EnRnp8CNd/X/USd4KzCaYUtzcezS4LbLy73xK2m/tec44L/NSrKkBoWCdXsddvxN75lIr5a 6mHXyk3l9TDxwpRmV/R3PA0ovG9xObpvMT7wFTbAClUKRB+M0sr+YM77OIOH95gehtO8e91g 8DuYK9ze1Nmr4kCm3KLnRmtnGCmRj7OpCMNkyi2YDMptyVoPWqCG+/qDrpeTzkc/ehT8suLy wl46g69drc17fAaVzvbBPeK4UP+SU8P61zC3B4bj7/CWidyN3L9Q34Fp9p7C1LzrJ5q9/BlO A2Ic6n4nK8og+mCc1uZZZlBqrXHinuKgyKn650dsvPiosxfJ17u39yYb2snLbdR2khHrtpxn IGt96tZL8kUAAScCg2/wf/v8D6ajLR/q+NU8P1zopDD2oLnHBftmvOqRADX0qBhlU7njeVfv 56Vmw18eLRuw+vVqPJV3Xs/al8Ofle9+2tNl+9ngF6Ze6E37XIfxH6/sftJe8Gt/fPFa/R5W FLF3rP8F9xYZr5RaaizLa+Hfa0r8eZhg0DqbmCaWysnSjjW3VCPS2PLJtuv4sdXAYuc4bcI7 EhjpvecoR8J6JZA6diczawTH8Ut1utxpmmTxYtl6XPAiQ5IoEqCGRUKevm457yo8/vbU+yFN N/y+ZPfo5ubmprR++A3mT73JXtkq3cAdVjTG7mp7YUPnYbPtzBW3no7fLut21uaZ1hiY3qEA 97Npd3tBG1EBUwHam3G9bDXOfy36JjD+ed/svpEKFfzNGuN+VJxbuXaCtUbLJ9uuu8fFBmY7 dxkT2bVAXfd1124PM37w+XGoJ/s5f84tONHaUObY/jClfOp18XHBp2zJxSRADU0qdXku/7zL fvwt+Lz1pPujDUw9SZo0EEAAAQQQQAABBBBAAAEEJkegxlN5J6cI9AQBBBBAAAEEEEAAAQQQ eMgCDEwfcvXpOwIIIIAAAggggAACCCDggQADUw+KQAoIIIAAAggggAACCCCAwEMWYGD6kKtP 3xFAAAEEEEAAAQQQQAABDwQYmHpQBFJAAAEEEEAAAQQQQAABBB6yAAPTh1x9+o4AAggggAAC CCCAAAIIeCDAwNSDIpACAggggAACCCCAAAIIIPCQBRiYPuTq03cEEEAAAQQQQAABBBBAwAOB /wNnd0y9zAkv7AAAAABJRU5ErkJggg== --------------020302020808080606030609-- --------------060309090506060409030405--