hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Raja Thiruvathuru <thiruvath...@gmail.com>
Subject Re: zip file or tar file cosumption
Date Sun, 30 Sep 2012 19:41:29 GMT
we can write custom codecs

On Sun, Sep 30, 2012 at 11:47 AM, Bejoy KS <bejoyks@outlook.com> wrote:
> Yes Manish, Zip is not supported in hadoop. You may have to use gzip
> instead.
>
> Regards
> Bejoy KS
>
>
> ________________________________
> Subject: RE: zip file or tar file cosumption
> From: manishbhoge@rocketmail.com
> To: user@hive.apache.org
> CC: Chuck.Connell@nuance.com
> Date: Sun, 30 Sep 2012 20:35:35 +0530
>
> Thanks Bejoy. I have zip file there is sense to convert into gzip again.
>
> Chuck, I got what you are trying to say. So I need to process it outside
> HDFS and bring the text file into HDFS.
>
>
> On Sun, 2012-09-30 at 18:21 +0530, Bejoy KS wrote:
>
> Hi Manish
>
> Gzip works well if you have the compression codec available in
> 'io.compression.codes' . Gzip codec is present in default.
>
> I don't think untar ing world be done by map reduce jobs. So tar files may
> not work with hive, you need to untar the files out of hadoop hive as a
> prerequisite.
>
>
>
> Regards
>
> Bejoy KS
>
>
> ________________________________
>
> To: user@hive.apache.org; Keshav.C.Savant@fisglobal.com
> Subject: Re: zip file or tar file cosumption
> From: manishbhoge@rocketmail.com
> Date: Sun, 30 Sep 2012 12:32:15 +0000
>
> What about .gz OR tar file. Does this unzip require at HDFS and load into
> hive? How you resolve it.
>
> Sent from my BlackBerry, pls excuse typo
>
> ________________________________
>
> From: "Connell, Chuck" <Chuck.Connell@nuance.com>
>
> Date: Sun, 30 Sep 2012 12:24:37 +0000
>
> To: user@hive.apache.org<user@hive.apache.org>; Savant,
> Keshav<Keshav.C.Savant@fisglobal.com>
>
> ReplyTo: user@hive.apache.org
>
> Subject: RE: zip file or tar file cosumption
>
>
>
> I have seen that error when I try to overwrite an existing file.
>
> But, more importantly, Hive cannot understand ZIP files. There was a long
> thread about this just a few days ago. Your table def says "stored as
> textfile" but you are not giving it a text file.
>
> Chuck
>
>
> ________________________________
>
> From: Manish [manishbhoge@rocketmail.com]
> Sent: Sunday, September 30, 2012 7:38 AM
> To: Savant, Keshav
> Cc: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> I am getting below error when loading zip file
>
> Driver returned: 9.  Errors: Hive history
> file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving:
> hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into:
> /user/manish/input/zip
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MoveTask
>
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip'
> OVERWRITE INTO TABLE `pageview_zip`
>
> Table definition:
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY
> ';' MAP KEYS TERMINATED BY '='
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
>
> Thank You,
> Manish
>
>
>
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
>
> True Manish.
>
>
>
> Keshav C Savant
>
>
>
>
> From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
> Sent: Thursday, September 27, 2012 4:26 PM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Thanks Savant. I believe this will hold good for .zip file also.
>
>
>
> Thank You,
>
> Manish.
>
>
>
> From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
> Sent: Thursday, September 27, 2012 10:19 AM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Manish the table that has been created for zipped text files should be
> defined as sequence file, for example
>
>
>
> CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',' stored as sequencefile;
>
>
>
> After this you can use regular load command to load these files, for example
>
>
>
> load data local inpath 'path-to-csv-file.gz' into table my_table_zip;
>
>
>
> hope this helps
>
>
>
> Keshav C Savant
>
>
>
>
> From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> Sent: Wednesday, September 26, 2012 9:43 PM
> To: user@hive.apache.org
> Subject: Re: zip file or tar file cosumption
>
>
>
>
> Hi Richin,
>
> Thanks! Yes this is what I wanted to understand how to load zip file to Hive
> table. Now, I'll try this option.
>
> Thank You,
> Manish.
>
> Sent from my BlackBerry, pls excuse typo
>
>
> ________________________________
>
> From:<richin.jain@nokia.com>
>
>
> Date:Wed, 26 Sep 2012 14:51:39 +0000
>
>
> To:<user@hive.apache.org>
>
>
> ReplyTo:user@hive.apache.org
>
>
> Subject:RE: zip file or tar file cosumption
>
>
>
>
>
> You are right Chuck. I thought his question was how to use zip files or any
> compressed files in Hive tables.
>
>
>
> Yeah, seems like you can’t do that
> see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
>
> But you can always compress your files in gzip format and they should be
> good to go.
>
>
>
> Richin
>
>
>
> From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> Sent: Wednesday, September 26, 2012 10:44 AM
> To: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> But TEXTFILE in Hive always has newline as the record delimiter. How could
> this possibly work with a zip/tar file that can contain ASCII 10 characters
> at random locations, and certainly does not have ASCII 10 at the end of each
> data record?
>
>
>
> Chuck Connell
>
> Nuance R&D Data Team
>
> Burlington, MA
>
>
>
>
>
>
> From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]
> Sent: Wednesday, September 26, 2012 10:14 AM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Hi Manish,
>
>
>
> If you have your zip file at location -  /home/manish/zipfile, you can just
> point your external table to that location like
>
> CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT
> DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE
> LOCATION ‘/home/manish/zipfile’;
>
>
>
> OR
>
>
>
> If you already have external table pointing to a certain location you can
> load this zip file into your table as
>
> LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
>
>
>
> Hope this helps.
>
>
>
> Richin
>
>
>
> From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> Sent: Wednesday, September 26, 2012 9:13 AM
> To: user@hive.apache.org
> Subject: Re: zip file or tar file cosumption
>
>
>
>
> Hi Savant,
>
> Got it. But I still need to understand that how to load zip? Can I directly
> use zip file in external table. can u pls help to get the load statement.
>
> Sent from my BlackBerry, pls excuse typo
>
>
> ________________________________
>
> From:"Savant, Keshav" <Keshav.C.Savant@fisglobal.com>
>
>
> Date:Wed, 26 Sep 2012 12:25:38 +0000
>
>
> To:user@hive.apache.org<user@hive.apache.org>
>
>
> ReplyTo:user@hive.apache.org
>
>
> Cc:Manish.Bhoge@target.com<Manish.Bhoge@target.com>;
> Chuck.Connell@nuance.com<Chuck.Connell@nuance.com>
>
>
> Subject:RE: zip file or tar file cosumption
>
>
>
>
>
> Another solution would be
>
>
>
> Using shell script do following
>
> 1.      unzip txt files,
>
> 2.      one by one merge those 50 (or N number of) text files into one text
> file,
>
> 3.      then the zip/tar that bigger text file,
>
> 4.      then that big zip/tar file can be uploaded into hive.
>
>
>
> Keshav C Savant
>
>
>
>
> From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> Sent: Wednesday, September 26, 2012 4:04 PM
> To: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> This could be a problem. Hive uses newline as the record separator. A ZIP
> file will certainly newline characters. So I doubt this is possible.
>
> BUT, I would like to hear from anyone who has solved the "newline is always
> a record separator" problem, because we ran into it for another type of
> compressed file.
>
> Chuck
>
> ________________________________
>
> From: Manish.Bhoge [Manish.Bhoge@target.com]
> Sent: Wednesday, September 26, 2012 3:17 AM
> To: user@hive.apache.org
> Subject: zip file or tar file cosumption
>
>
> Hivers,
>
>
>
> I want to understand that would it be possible to utilize zip/tar files
> directly into Hive. All the files has similar schema (structure).  Say 50
> *.txt files are zipped into a single zip file can we load data directly from
> this zip file OR should we need to unzip first?
>
>
>
> Thanks & Regards
>
> Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext:
> 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.” MySite
>
>
>
>
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i) delete the
> message and all copies; (ii) do not disclose, distribute or use the message
> in any manner; and (iii) notify the sender immediately. In addition, please
> be aware that any message addressed to our domain is subject to archiving
> and review by persons other than the intended recipient. Thank you.
>
>
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i) delete the
> message and all copies; (ii) do not disclose, distribute or use the message
> in any manner; and (iii) notify the sender immediately. In addition, please
> be aware that any message addressed to our domain is subject to archiving
> and review by persons other than the intended recipient. Thank you.
>
>
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i) delete the
> message and all copies; (ii) do not disclose, distribute or use the message
> in any manner; and (iii) notify the sender immediately. In addition, please
> be aware that any message addressed to our domain is subject to archiving
> and review by persons other than the intended recipient. Thank you.
>
>
>
>
>



-- 

Raja Thiruvathuru

Mime
View raw message