hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Manish <manishbh...@rocketmail.com>
Subject RE: zip file or tar file cosumption
Date Sun, 30 Sep 2012 15:05:35 GMT
Thanks Bejoy. I have zip file there is sense to convert into gzip again.

Chuck, I got what you are trying to say. So I need to process it outside
HDFS and bring the text file into HDFS.


On Sun, 2012-09-30 at 18:21 +0530, Bejoy KS wrote: 
> Hi Manish
> 
> Gzip works well if you have the compression codec available in
> 'io.compression.codes' . Gzip codec is present in default.
> 
> I don't think untar ing world be done by map reduce jobs. So tar files
> may not work with hive, you need to untar the files out of hadoop hive
> as a prerequisite.
> 
> 
> 
> Regards
> Bejoy KS
> 
> 
> 
> 
> 
> ______________________________________________________________________
> To: user@hive.apache.org; Keshav.C.Savant@fisglobal.com
> Subject: Re: zip file or tar file cosumption
> From: manishbhoge@rocketmail.com
> Date: Sun, 30 Sep 2012 12:32:15 +0000
> 
> What about .gz OR tar file. Does this unzip require at HDFS and load
> into hive? How you resolve it.
> 
> 
> Sent from my BlackBerry, pls excuse typo
> 
> 
> ______________________________________________________________________
> 
> From: "Connell, Chuck" <Chuck.Connell@nuance.com>
> Date: Sun, 30 Sep 2012 12:24:37 +0000
> To: user@hive.apache.org<user@hive.apache.org>; Savant,
> Keshav<Keshav.C.Savant@fisglobal.com>
> ReplyTo: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
> 
> 
> I have seen that error when I try to overwrite an existing file. 
> 
> But, more importantly, Hive cannot understand ZIP files. There was a
> long thread about this just a few days ago. Your table def says
> "stored as textfile" but you are not giving it a text file.
> 
> Chuck
> 
> 
> 
> 
> ______________________________________________________________________
> 
> From: Manish [manishbhoge@rocketmail.com]
> Sent: Sunday, September 30, 2012 7:38 AM
> To: Savant, Keshav
> Cc: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
> 
> 
> 
> 
> I am getting below error when loading zip file 
> 
> Driver returned: 9.  Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip
into: /user/manish/input/zip
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
> 
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO
TABLE `pageview_zip`
> 
> Table definition: 
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP
KEYS TERMINATED BY '=' 
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
> 
> Thank You,
> Manish
> 
> 
> 
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote: 
> 
>         True Manish.
>         
>          
>         
>         Keshav C Savant 
>         
>         
>          
>         
>         From: Manish.Bhoge [mailto:Manish.Bhoge@target.com] 
>         Sent: Thursday, September 27, 2012 4:26 PM
>         To: user@hive.apache.org; manishbhoge@rocketmail.com
>         Subject: RE: zip file or tar file cosumption
>         
>         
>          
>         
>         Thanks Savant. I believe this will hold good for .zip file
>         also.
>         
>          
>         
>         Thank You,
>         
>         Manish.
>         
>          
>         
>         From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com] 
>         Sent: Thursday, September 27, 2012 10:19 AM
>         To: user@hive.apache.org; manishbhoge@rocketmail.com
>         Subject: RE: zip file or tar file cosumption
>         
>         
>          
>         
>         Manish the table that has been created for zipped text files
>         should be defined as sequence file, for example
>         
>          
>         
>         CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
>         DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
>         
>          
>         
>         After this you can use regular load command to load these
>         files, for example
>         
>          
>         
>         load data local inpath 'path-to-csv-file.gz' into table
>         my_table_zip;
>         
>          
>         
>         hope this helps
>         
>          
>         
>         Keshav C Savant 
>         
>         
>          
>         
>         From: Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
>         Sent: Wednesday, September 26, 2012 9:43 PM
>         To: user@hive.apache.org
>         Subject: Re: zip file or tar file cosumption
>         
>         
>          
>         
>         Hi Richin,
>         
>         Thanks! Yes this is what I wanted to understand how to load
>         zip file to Hive table. Now, I'll try this option.
>         
>         Thank You,
>         Manish. 
>         
>         Sent from my BlackBerry, pls excuse typo
>         
>         
>         
>                                        
>         ______________________________________________________________
>         
>         From:<richin.jain@nokia.com> 
>         
>         
>         Date:Wed, 26 Sep 2012 14:51:39 +0000
>         
>         
>         To:<user@hive.apache.org>
>         
>         
>         ReplyTo:user@hive.apache.org 
>         
>         
>         Subject:RE: zip file or tar file cosumption
>         
>         
>          
>         
>         
>         You are right Chuck. I thought his question was how to use zip
>         files or any compressed files in Hive tables.
>         
>          
>         
>         Yeah, seems like you can’t do that
>         see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
>         
>         But you can always compress your files in gzip format and they
>         should be good to go.
>         
>          
>         
>         Richin
>         
>          
>         
>         From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
>         Sent: Wednesday, September 26, 2012 10:44 AM
>         To: user@hive.apache.org
>         Subject: RE: zip file or tar file cosumption
>         
>         
>          
>         
>         But TEXTFILE in Hive always has newline as the record
>         delimiter. How could this possibly work with a zip/tar file
>         that can contain ASCII 10 characters at random locations, and
>         certainly does not have ASCII 10 at the end of each data
>         record?
>         
>          
>         
>         Chuck Connell
>         
>         Nuance R&D Data Team
>         
>         Burlington, MA
>         
>          
>         
>          
>         
>         
>         From:richin.jain@nokia.com [mailto:richin.jain@nokia.com] 
>         Sent: Wednesday, September 26, 2012 10:14 AM
>         To: user@hive.apache.org; manishbhoge@rocketmail.com
>         Subject: RE: zip file or tar file cosumption
>         
>         
>          
>         
>         Hi Manish,
>         
>          
>         
>         If you have your zip file at location -  /home/manish/zipfile,
>         you can just point your external table to that location like
>         
>         CREATE EXTERNAL TABLE manish_test (field1 string, field2
>         string) ROW FORMAT DELIMITED FIELDS TERMINATED BY
>         <your_column_delimiter> STORED AS TEXTFILE LOCATION
>         ‘/home/manish/zipfile’;
>         
>          
>         
>         OR
>         
>          
>         
>         If you already have external table pointing to a certain
>         location you can load this zip file into your table as
>         
>         LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE
>         manish_test;
>         
>          
>         
>         Hope this helps.
>         
>          
>         
>         Richin
>         
>          
>         
>         From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
>         Sent: Wednesday, September 26, 2012 9:13 AM
>         To: user@hive.apache.org
>         Subject: Re: zip file or tar file cosumption
>         
>         
>          
>         
>         Hi Savant,
>         
>         Got it. But I still need to understand that how to load zip?
>         Can I directly use zip file in external table. can u pls help
>         to get the load statement.
>         
>         Sent from my BlackBerry, pls excuse typo
>         
>         
>         
>                                        
>         ______________________________________________________________
>         
>         From:"Savant, Keshav" <Keshav.C.Savant@fisglobal.com>
>         
>         
>         Date:Wed, 26 Sep 2012 12:25:38 +0000
>         
>         
>         To:user@hive.apache.org<user@hive.apache.org>
>         
>         
>         ReplyTo:user@hive.apache.org
>         
>         
>         Cc:Manish.Bhoge@target.com<Manish.Bhoge@target.com>;
>         Chuck.Connell@nuance.com<Chuck.Connell@nuance.com>
>         
>         
>         Subject:RE: zip file or tar file cosumption
>         
>         
>          
>         
>         
>         Another solution would be
>         
>          
>         
>         Using shell script do following
>         
>         1.      unzip txt files, 
>         
>         2.      one by one merge those 50 (or N number of) text files
>         into one text file,
>         
>         3.      then the zip/tar that bigger text file,
>         
>         4.      then that big zip/tar file can be uploaded into hive.
>         
>          
>         
>         Keshav C Savant 
>         
>         
>          
>         
>         From: Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
>         Sent: Wednesday, September 26, 2012 4:04 PM
>         To: user@hive.apache.org
>         Subject: RE: zip file or tar file cosumption
>         
>         
>          
>         
>         This could be a problem. Hive uses newline as the record
>         separator. A ZIP file will certainly newline characters. So I
>         doubt this is possible.
>         
>         BUT, I would like to hear from anyone who has solved the
>         "newline is always a record separator" problem, because we ran
>         into it for another type of compressed file.
>         
>         Chuck
>         
>         
>                                        
>         ______________________________________________________________
>         
>         From: Manish.Bhoge [Manish.Bhoge@target.com]
>         Sent: Wednesday, September 26, 2012 3:17 AM
>         To: user@hive.apache.org
>         Subject: zip file or tar file cosumption
>         
>         
>         Hivers,
>         
>          
>         
>         I want to understand that would it be possible to utilize
>         zip/tar files directly into Hive. All the files has similar
>         schema (structure).  Say 50 *.txt files are zipped into a
>         single zip file can we load data directly from this zip file
>         OR should we need to unzip first?
>         
>          
>         
>         Thanks & Regards
>         
>         Manish Bhoge | Technical
>         Architect ¤TargetDW/BI|( +919379850010 (M) Ext: 5691 VOIP:
>         22165 |! “Excellence is not a skill, It is an attitude.”
>         MySite
>         
>          
>         
>         
>         _____________
>         The information contained in this message is proprietary
>         and/or confidential. If you are not the intended recipient,
>         please: (i) delete the message and all copies; (ii) do not
>         disclose, distribute or use the message in any manner; and
>         (iii) notify the sender immediately. In addition, please be
>         aware that any message addressed to our domain is subject to
>         archiving and review by persons other than the intended
>         recipient. Thank you.
>         
>         
>         _____________
>         The information contained in this message is proprietary
>         and/or confidential. If you are not the intended recipient,
>         please: (i) delete the message and all copies; (ii) do not
>         disclose, distribute or use the message in any manner; and
>         (iii) notify the sender immediately. In addition, please be
>         aware that any message addressed to our domain is subject to
>         archiving and review by persons other than the intended
>         recipient. Thank you.
>         
>         
>         _____________
>         The information contained in this message is proprietary
>         and/or confidential. If you are not the intended recipient,
>         please: (i) delete the message and all copies; (ii) do not
>         disclose, distribute or use the message in any manner; and
>         (iii) notify the sender immediately. In addition, please be
>         aware that any message addressed to our domain is subject to
>         archiving and review by persons other than the intended
>         recipient. Thank you.
>         
> 
> 
> 
> 


Mime
View raw message