hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bejoy KS <bejo...@outlook.com>
Subject RE: zip file or tar file cosumption
Date Sun, 30 Sep 2012 12:51:52 GMT
Hi ManishGzip works well if you have the compression codec available in 'io.compression.codes'
. Gzip codec is present in default.I don't think untar ing world be done by map reduce jobs.
So tar files may not work with hive, you need to untar the files out of hadoop hive as a prerequisite.
RegardsBejoy KS

To: user@hive.apache.org; Keshav.C.Savant@fisglobal.com
Subject: Re: zip file or tar file cosumption
From: manishbhoge@rocketmail.com
Date: Sun, 30 Sep 2012 12:32:15 +0000






What about .gz OR tar file. Does this unzip require at HDFS and load into hive? How you resolve
it.

Sent from my BlackBerry, pls excuse typoFrom:  "Connell, Chuck" <Chuck.Connell@nuance.com>
Date: Sun, 30 Sep 2012 12:24:37 +0000To: user@hive.apache.org<user@hive.apache.org>;
Savant, Keshav<Keshav.C.Savant@fisglobal.com>ReplyTo:  user@hive.apache.org
Subject: RE: zip file or tar file cosumption

I have seen that error when I try to overwrite an existing file.




But, more importantly, Hive cannot understand ZIP files. There was a long thread about this
just a few days ago. Your table def says "stored as textfile" but you are not giving it a
text file.



Chuck







From: Manish [manishbhoge@rocketmail.com]

Sent: Sunday, September 30, 2012 7:38 AM

To: Savant, Keshav

Cc: user@hive.apache.org

Subject: RE: zip file or tar file cosumption







I am getting below error when loading zip file
Driver returned: 9.  Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
Loading data to table default.pageview_zip
Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip
into: /user/manish/input/zip
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE
`pageview_zip`

Table definition: 
CREATE external TABLE pageview_zip
(
C_0 STRING,
C_1 STRING,
C_7 MAP<STRING,STRING>,
C_8 STRING,
C_13 MAP<STRING,STRING>,
C_21 STRING
)
COMMENT 'Page View'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS
TERMINATED BY '=' 
STORED AS TEXTFILE LOCATION '/user/manish/input/zip'

Thank You,
Manish





On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote: 

True Manish.



 



Keshav C Savant 





 



From: Manish.Bhoge [mailto:Manish.Bhoge@target.com] 

Sent: Thursday, September 27, 2012 4:26 PM

To: user@hive.apache.org; manishbhoge@rocketmail.com

Subject: RE: zip file or tar file cosumption





 



Thanks Savant. I believe this will hold good for .zip file also.



 



Thank You,



Manish.



 



From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]


Sent: Thursday, September 27, 2012 10:19 AM

To: user@hive.apache.org;
manishbhoge@rocketmail.com

Subject: RE: zip file or tar file cosumption





 



Manish the table that has been created for zipped text files should be defined as sequence
file, for example



 



CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED
BY ',' stored as sequencefile;



 



After this you can use regular load command to load these files, for example



 



load data local inpath 'path-to-csv-file.gz' into table my_table_zip;



 



hope this helps



 



Keshav C Savant 





 



From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]


Sent: Wednesday, September 26, 2012 9:43 PM

To: user@hive.apache.org

Subject: Re: zip file or tar file cosumption





 



Hi Richin,



Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll
try this option.



Thank You,

Manish. 



Sent from my BlackBerry, pls excuse typo











From:<richin.jain@nokia.com>






Date:Wed, 26 Sep 2012 14:51:39 +0000





To:<user@hive.apache.org>





ReplyTo:user@hive.apache.org






Subject:RE: zip file or tar file cosumption





 





You are right Chuck. I thought his question was how to use zip files or any compressed files
in Hive tables.



 



Yeah, seems like you can’t do that see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E



But you can always compress your files in gzip format and they should be good to go.



 



Richin



 



From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]


Sent: Wednesday, September 26, 2012 10:44 AM

To: user@hive.apache.org

Subject: RE: zip file or tar file cosumption





 



But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work
with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly
does not have ASCII 10 at the end of each data record?



 



Chuck Connell



Nuance R&D Data Team



Burlington, MA



 



 





From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]


Sent: Wednesday, September 26, 2012 10:14 AM

To: user@hive.apache.org;
manishbhoge@rocketmail.com

Subject: RE: zip file or tar file cosumption





 



Hi Manish,



 



If you have your zip file at location -  /home/manish/zipfile, you can just point your external
table to that location like



CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS
TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION ‘/home/manish/zipfile’;



 



OR



 



If you already have external table pointing to a certain location you can load this zip file
into your table as



LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;



 



Hope this helps.



 



Richin



 



From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]


Sent: Wednesday, September 26, 2012 9:13 AM

To: user@hive.apache.org

Subject: Re: zip file or tar file cosumption





 



Hi Savant,



Got it. But I still need to understand that how to load zip? Can I directly use zip file in
external table. can u pls help to get the load statement.



Sent from my BlackBerry, pls excuse typo











From:"Savant, Keshav" <Keshav.C.Savant@fisglobal.com>





Date:Wed, 26 Sep 2012 12:25:38 +0000





To:user@hive.apache.org<user@hive.apache.org>





ReplyTo:user@hive.apache.org





Cc:Manish.Bhoge@target.com<Manish.Bhoge@target.com>;

Chuck.Connell@nuance.com<Chuck.Connell@nuance.com>





Subject:RE: zip file or tar file cosumption





 





Another solution would be



 



Using shell script do following



1.      unzip txt files, 



2.      one by one merge those 50 (or N number of) text files into one text file,



3.      then the zip/tar that bigger text file,



4.      then that big zip/tar file can be uploaded into hive.



 



Keshav C Savant 





 



From: Connell, Chuck 
[mailto:Chuck.Connell@nuance.com] 

Sent: Wednesday, September 26, 2012 4:04 PM

To: user@hive.apache.org

Subject: RE: zip file or tar file cosumption





 



This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly
newline characters. So I doubt this is possible.



BUT, I would like to hear from anyone who has solved the "newline is always a record separator"
problem, because we ran into it for another type of compressed file.



Chuck









From: Manish.Bhoge [Manish.Bhoge@target.com]

Sent: Wednesday, September 26, 2012 3:17 AM

To: user@hive.apache.org

Subject: zip file or tar file cosumption





Hivers,



 



I want to understand that would it be possible to utilize zip/tar files directly into Hive.
All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single
zip file can we load data directly from this zip file OR should we need to
 unzip first?



 



Thanks & Regards



Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext: 5691 VOIP: 22165
|! “Excellence is not a skill, It is an attitude.”
MySite



 





_____________

The information contained in this message is proprietary and/or confidential. If you are not
the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose,
distribute or use the message in any manner; and (iii) notify the sender
 immediately. In addition, please be aware that any message addressed to our domain is subject
to archiving and review by persons other than the intended recipient. Thank you.





_____________

The information contained in this message is proprietary and/or confidential. If you are not
the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose,
distribute or use the message in any manner; and (iii) notify the sender
 immediately. In addition, please be aware that any message addressed to our domain is subject
to archiving and review by persons other than the intended recipient. Thank you.





_____________

The information contained in this message is proprietary and/or confidential. If you are not
the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose,
distribute or use the message in any manner; and (iii) notify the sender
 immediately. In addition, please be aware that any message addressed to our domain is subject
to archiving and review by persons other than the intended recipient. Thank you.










 		 	   		  
Mime
View raw message