hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gabriel balan <gabriel.ba...@oracle.com>
Subject Re: parque table
Date Mon, 04 May 2015 15:33:05 GMT
Hi

If you're quoted fields may contain commas, you must use RegexSerDe to parse each line into
fields.

    create table foo(c0 string, c1 string, c2 string, c3 string,  c4 string,  c5 string, 
c6 string,  c7 string)
    row format serde 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
    with serdeproperties
    ("input.regex" = "^([^,]*),\"([^\"]*)\",([^,]*),([^,]*),\"([^\"]*)\",\"([^\"]*)\",\"([^\"]*)\",\"([^\"]*)\"$");


    --here I assumed some fields are always quoted, and some fields are always unquoted. You
may need something fancier for the general case.

    load DATA local inpath 'log.txt.gz' into table foo;

    select * from foo;
    OK
    106     2003-02-03      20      2       A       2 2       037
    106     2003-02-03      20      3       A       2 2       037
    106     2003-02-03      8       2       A       2 2       037

If you're sure there are no commas in your quoted fields, then you could try putting a view
on top of the table, and have the view use UDFs to strip the quotes.


hth
Gabriel Balan

On 5/2/2015 1:04 AM, Kumar Jayapal wrote:6
> Hi,
>
> When I am loading this data I am getting " "  inserted into the table how to load with
out " "it.
>
>
> Inline image 1
>
>
>
> thanks
> jay
>
>
>
>
>
>
>
>
>
>
> Thanks
> Jay
>
> On Fri, May 1, 2015 at 8:21 AM, Hadoop User <kjayapal17@gmail.com <mailto:kjayapal17@gmail.com>>
wrote:
>
>     Here is the content of the file once it's unzip
>
>     106,"2003-02-03",20,2,"A","2","2","037"
>     106,"2003-02-03",20,3,"A","2","2","037"
>     106,"2003-02-03",8,2,"A","2","2","037"
>
>
>
>
>
>     On May 1, 2015, at 7:32 AM, Asit Parija <asit@sigmoidanalytics.com <mailto:asit@sigmoidanalytics.com>>
wrote:
>
>>     Hi Kumar ,
>>       You can remove the stored as text file part and then try that out by default
it should be able to read the .gz files ( if they are comma delimited csv files ) .
>>
>>
>>     Thanks
>>     Asit
>>
>>     On Fri, May 1, 2015 at 10:55 AM, Kumar Jayapal <kjayapal17@gmail.com <mailto:kjayapal17@gmail.com>>
wrote:
>>
>>         Hello Nitin,
>>
>>         Dint understand what you mean. Are you telling me to  set COMPRESSION_CODEC=gzip
?
>>
>>         thanks
>>         Jay
>>
>>         On Thu, Apr 30, 2015 at 10:02 PM, Nitin Pawar <nitinpawar432@gmail.com
<mailto:nitinpawar432@gmail.com>> wrote:
>>
>>             You loaded a gz file in a table stored as text file
>>             either define compression format or uncompress the file and load it
>>
>>             On Fri, May 1, 2015 at 9:17 AM, Kumar Jayapal <kjayapal17@gmail.com
<mailto:kjayapal17@gmail.com>> wrote:
>>
>>                 Created table CREATE TABLE raw (line STRING) PARTITIONED BY (FISCAL_YEAR
 smallint, FISCAL_PERIOD smallint)
>>                 STORED AS TEXTFILE;
>>
>>                 and loaded it with data.
>>
>>                 |LOAD DATA LOCAL INPATH ||'/tmp/weblogs/20090603-access.log.gz'||INTO
TABLE raw;|
>>                 |
>>                 |
>>                 |I have to load it to parque table|
>>                 |
>>                 |
>>                 |when I say select * from raw it shows all null values.|
>>                 |
>>                 |
>>                 |
>>
>>                 	NULL 	NULL 	NULL 	NULL 	NULL 	NULL 	NULL 	NULL
>>
>>                 	NULL 	NULL 	NULL 	NULL 	NULL 	NULL 	NULL 	NULL
>>
>>                 	NULL 	NULL 	NULL 	NULL 	NULL 	NULL 	NULL 	NULL
>>
>>                 	NULL 	NULL 	NULL 	NULL 	NULL 	NULL 	NULL 	NULL
>>
>>                 |
>>                 Why is not show showing the actual data in file. will it show once
I load it to parque table?
>>
>>                 Please let me know if I am doing anything wrong.
>>
>>
>>                 Thanks
>>                 jay
>>                 |
>>                 |
>>
>>
>>
>>
>>             -- 
>>             Nitin Pawar
>>
>>
>>
>

-- 
The statements and opinions expressed here are my own and do not necessarily represent those
of Oracle Corporation.


Mime
View raw message