Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F059F10EC7 for ; Wed, 5 Jun 2013 18:29:12 +0000 (UTC) Received: (qmail 80026 invoked by uid 500); 5 Jun 2013 18:29:11 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 79881 invoked by uid 500); 5 Jun 2013 18:29:10 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 79873 invoked by uid 99); 5 Jun 2013 18:29:10 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Jun 2013 18:29:10 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,WEIRD_PORT X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of spragues@gmail.com designates 209.85.223.175 as permitted sender) Received: from [209.85.223.175] (HELO mail-ie0-f175.google.com) (209.85.223.175) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 05 Jun 2013 18:29:06 +0000 Received: by mail-ie0-f175.google.com with SMTP id a11so4595468iee.34 for ; Wed, 05 Jun 2013 11:28:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=ctQVeKInpV8IHCMMlGJlwuJ4JxOPr84BWPmI9AZ3DBY=; b=oJCGNVWy4v/gEK7uzc/bvbVDdtCMj8UZqzfOWvb/YtYmlMZoJqlBlMJUPlThT3dhCL Ez2EPUmA9hud/idG7yrmqSexwn3HxQOUVwRnhyTRSTh24RcSUhIIFq93siGJ9VlF19RG k9YY8FVGUdl2kvSvDPye2ddtxDy/7HdcH25YMVfjUdEinkZedK/2JpwKqkPpRxDJz3qh D4fIf95FTJpTuYuGW4lQF7rIUGmh9PmcxNJiqDY+h4gu8r9iOIjkvKgIXDlaOJqTkK1Y 0YLvWb+feVepbAbyRvsg8ziFA3QoElrMwh+3dGtaswJaOmY+yMslIZusciN+AmLli7tH fi/A== X-Received: by 10.42.161.4 with SMTP id r4mr15537135icx.8.1370456923083; Wed, 05 Jun 2013 11:28:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.115.8 with HTTP; Wed, 5 Jun 2013 11:28:23 -0700 (PDT) In-Reply-To: References: From: Stephen Sprague Date: Wed, 5 Jun 2013 11:28:23 -0700 Message-ID: Subject: Re: Textfile compression using Gzip codec To: user@hive.apache.org Content-Type: multipart/alternative; boundary=90e6ba6e8b3014c86804de6c5e5f X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba6e8b3014c86804de6c5e5f Content-Type: text/plain; charset=ISO-8859-1 well... the hiveException has the word "metadata" in it. maybe that's a hint or a red-herrring. :) Let's try the following: 1. show create table * facts520_normal_text; * *2. anything useful at this URL? ** http://aana1.ird.com:50030/taskdetails.jsp?jobid=job_201306051948_0010&tipid=task_201306051948_0010_m_000002or is it just the same stack dump? * On Wed, Jun 5, 2013 at 3:17 AM, Sachin Sudarshana wrote: > Hi, > > I have hive 0.10 + (CDH 4.2.1 patches) installed on my cluster. > > I have a table facts520_normal_text stored as a textfile. I'm trying to > create a compressed table from this table using GZip codec. > > *hive> SET hive.exec.compress.output=true;* > *hive> SET > mapred.output.compression.codec=org.apache.hadoop.io.compress.GZipCodec;* > *hive> SET mapred.output.compression.type=BLOCK;* > * > * > *hive>* > * > Create table facts520_gzip_text* > * > (fact_key BIGINT,* > * > products_key INT,* > * > retailers_key INT,* > * > suppliers_key INT,* > * > time_key INT,* > * > units INT)* > * > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','* > * > LINES TERMINATED BY '\n'* > * > STORED AS TEXTFILE;* > * > * > *hive> INSERT OVERWRITE TABLE facts520_gzip_text SELECT * from > facts520_normal_text;* > > > When I run the above queries, the MR job fails. > > The error that the Hive CLI itself shows is the following: > > *Total MapReduce jobs = 3* > *Launching Job 1 out of 3* > *Number of reduce tasks is set to 0 since there's no reduce operator* > *Starting Job = job_201306051948_0010, Tracking URL = > http://aana1.ird.com:50030/jobdetails.jsp?jobid=job_201306051948_0010* > *Kill Command = /usr/lib/hadoop/bin/hadoop job -kill > job_201306051948_0010* > *Hadoop job information for Stage-1: number of mappers: 3; number of > reducers: 0* > *2013-06-05 21:09:42,281 Stage-1 map = 0%, reduce = 0%* > *2013-06-05 21:10:11,446 Stage-1 map = 100%, reduce = 100%* > *Ended Job = job_201306051948_0010 with errors* > *Error during job, obtaining debugging information...* > *Job Tracking URL: > http://aana1.ird.com:50030/jobdetails.jsp?jobid=job_201306051948_0010* > *Examining task ID: task_201306051948_0010_m_000004 (and more) from job > job_201306051948_0010* > *Examining task ID: task_201306051948_0010_m_000001 (and more) from job > job_201306051948_0010* > * > * > *Task with the most failures(4):* > *-----* > *Task ID:* > * task_201306051948_0010_m_000002* > * > * > *URL:* > * > http://aana1.ird.com:50030/taskdetails.jsp?jobid=job_201306051948_0010&tipid=task_201306051948_0010_m_000002 > * > *-----* > *Diagnostic Messages for this Task:* > *java.lang.RuntimeException: > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while > processing row > {"fact_key":7549094,"products_key":205,"retailers_key":304,"suppliers_key":402,"time_key":103,"units":23} > * > * at > org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)* > * at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)* > * at > org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)* > * at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)* > * at org.apache.hadoop.mapred.Child$4.run(Child.java:268)* > * at java.security.AccessController.doPrivileged(Native Method)* > * at javax.security.auth.Subject.doAs(Subject.java:415)* > * at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408) > * > * at org.apache.hadoop.mapred.Child.main(Child.java:262)* > *Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive > Runtime Error while processing row > {"fact_key":7549094,"products_key":205,"retailers_key":304,"suppliers_key":402,"time_key":103,"units":23} > * > * at org.apach* > * > * > *FAILED: Execution Error, return code 2 from > org.apache.hadoop.hive.ql.exec.MapRedTask* > *MapReduce Jobs Launched:* > *Job 0: Map: 3 HDFS Read: 0 HDFS Write: 0 FAIL* > *Total MapReduce CPU Time Spent: 0 msec* > > > I'm unable to figure out why this is happening. It looks like the data is > not being able to be copied properly. > Or is it that GZip codec is not supported on textfiles? > > Any help in this issue is greatly appreciated! > > Thank you, > Sachin > > > --90e6ba6e8b3014c86804de6c5e5f Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
well...=A0=A0 the hiveException has the word &qu= ot;metadata" in it.=A0 maybe that's a hint or a red-herrring. :)= =A0=A0=A0 Let's try the following:

1.=A0 show create table= facts520_normal_text;

2.=A0 anything useful at this URL? = =A0 http://aana1.ird.com:50030/taskdetails.jsp?jobid=3Djob_201306051948_0010&a= mp;tipid=3Dtask_201306051948_0010_m_000002 or is it just the same stack= dump?




On Wed, Jun 5, 2013 at 3:17 AM, Sachin Sudarshana <sachin.h= adoop@gmail.com> wrote:
Hi,

I ha= ve hive 0.10 + (CDH 4.2.1 patches) installed on my cluster.

<= /div>
I have a table facts520_normal_text stored as a textfile. I'm tryi= ng to create a compressed table from this table using GZip codec.

hive> SET hive.exec.compress.output=3Dtrue;
hive> SET mapred.output.compression.codec=3Dorg.ap= ache.hadoop.io.compress.GZipCodec;
hive> SET map= red.output.compression.type=3DBLOCK;

hive>
= =A0 =A0 > Create table facts520_gzip_text
=A0= =A0 > (fact_key BIGINT,
=A0 =A0 > products_k= ey INT,
=A0 =A0 > retailers_key INT,
=A0 =A0 = > suppliers_key INT,
=A0 =A0 > time_key INT,<= /i>
=A0 =A0 > units INT)
=A0= =A0 > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
=A0 =A0 > LINES TERMINATED BY '\n'
= =A0 =A0 > STORED AS TEXTFILE;

hive> INSERT OVERWRITE TABLE facts520_gzip_t= ext SELECT * from facts520_normal_text;


When I run the above queries, the = MR job fails.

The error that the Hive CLI itself s= hows is the following:

Total MapReduce jobs =3D 3
Launchin= g Job 1 out of 3
Number of reduce tasks is set to 0= since there's no reduce operator
Starting Job = =3D job_201306051948_0010, Tracking URL =3D http= ://aana1.ird.com:50030/jobdetails.jsp?jobid=3Djob_201306051948_0010=
Kill Command =3D /usr/lib/hadoop/bin/hadoop job =A0-kill job_201= 306051948_0010
Hadoop job information for Stage-1: = number of mappers: 3; number of reducers: 0
2013-06= -05 21:09:42,281 Stage-1 map =3D 0%, =A0reduce =3D 0%
2013-06-05 21:10:11,446 Stage-1 map =3D 100%, =A0reduce =3D 100%=
Ended Job =3D job_201306051948_0010 with errors
Error during job, obtaining debugging information...<= /i>
= Examining task ID: task_201306051948_0010_m_000004 (and more) from jo= b job_201306051948_0010
Examining task ID: task_201306051948_0010_m_000001 (and more) fr= om job job_201306051948_0010

Task with the most failures(4):
-----
Task ID:
=A0 task_201306051948_0010_m_00= 0002

URL:
-----
Diagnostic Messages for this Task:=
java.lang.RuntimeException: org.apache.hadoop.hive= .ql.metadata.HiveException: Hive Runtime Error while processing row {"= fact_key":7549094,"products_key":205,"retailers_key&quo= t;:304,"suppliers_key":402,"time_key":103,"units&q= uot;:23}
=A0 =A0 =A0 =A0 at org.apache.hadoop.hive.ql.exec.ExecMapper.map= (ExecMapper.java:161)
=A0 =A0 =A0 =A0 at org.apache= .hadoop.mapred.MapRunner.run(MapRunner.java:50)
=A0= =A0 =A0 =A0 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:= 418)
=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.MapTask.run(MapTask.= java:333)
=A0 =A0 =A0 =A0 at org.apache.hadoop.mapr= ed.Child$4.run(Child.java:268)
=A0 =A0 =A0 =A0 at j= ava.security.AccessController.doPrivileged(Native Method)
=A0 =A0 =A0 =A0 at javax.security.auth.Subject.doAs(Subject.java= :415)
=A0 =A0 =A0 =A0 at org.apache.hadoop.security= .UserGroupInformation.doAs(UserGroupInformation.java:1408)
=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.Child.main(Child.java:2= 62)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hiv= e Runtime Error while processing row {"fact_key":7549094,"pr= oducts_key":205,"retailers_key":304,"suppliers_key"= ;:402,"time_key":103,"units":23}
=A0 =A0 =A0 =A0 at org.apach

FAILED: Execution Error, return code 2 from org.apache.ha= doop.hive.ql.exec.MapRedTask
MapReduce Jobs Launche= d:
Job 0: Map: 3 =A0 HDFS Read: 0 HDFS Write: 0 FAIL
<= div>Total MapReduce CPU Time Spent: 0 msec


I'm unable to figure out why this is happening. I= t looks like the data is not being able to be copied properly.
Or is it that GZip codec is not supported on textfiles?

=
Any help in this issue is greatly appreciated!

Thank you,
Sachin



--90e6ba6e8b3014c86804de6c5e5f--