Mailing-List: contact user-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hive.apache.org
Received-SPF: pass (athena.apache.org: domain of spragues@gmail.com designates
 209.85.223.175 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CABKOOENB-iG=xjbR_=e1yw6fhNf2RRwczZKXSTyhA3DRWveR1w@mail.gmail.com>
References: 
 <CABKOOENB-iG=xjbR_=e1yw6fhNf2RRwczZKXSTyhA3DRWveR1w@mail.gmail.com>
From: Stephen Sprague <spragues@gmail.com>
Date: Wed, 5 Jun 2013 11:28:23 -0700
Message-ID: 
 <CAC06LGZ2yRpGsnYLOLhsCb-Zz4o5iwGiUsUXYEnJbcEcXDpMng@mail.gmail.com>
Subject: Re: Textfile compression using Gzip codec
To: user@hive.apache.org
Content-Type: multipart/alternative; boundary=90e6ba6e8b3014c86804de6c5e5f

--90e6ba6e8b3014c86804de6c5e5f
Content-Type: text/plain; charset=ISO-8859-1

well...   the hiveException has the word "metadata" in it.  maybe that's a
hint or a red-herrring. :)    Let's try the following:

1.  show create table * facts520_normal_text;

*
*2.  anything useful at this URL? **
http://aana1.ird.com:50030/taskdetails.jsp?jobid=job_201306051948_0010&tipid=task_201306051948_0010_m_000002or
is it just the same stack dump?


*


On Wed, Jun 5, 2013 at 3:17 AM, Sachin Sudarshana
<sachin.hadoop@gmail.com>wrote:

> Hi,
>
> I have hive 0.10 + (CDH 4.2.1 patches) installed on my cluster.
>
> I have a table facts520_normal_text stored as a textfile. I'm trying to
> create a compressed table from this table using GZip codec.
>
> *hive> SET hive.exec.compress.output=true;*
> *hive> SET
> mapred.output.compression.codec=org.apache.hadoop.io.compress.GZipCodec;*
> *hive> SET mapred.output.compression.type=BLOCK;*
> *
> *
> *hive>*
> *    > Create table facts520_gzip_text*
> *    > (fact_key BIGINT,*
> *    > products_key INT,*
> *    > retailers_key INT,*
> *    > suppliers_key INT,*
> *    > time_key INT,*
> *    > units INT)*
> *    > ROW FORMAT DELIMITED FIELDS TERMINATED BY ','*
> *    > LINES TERMINATED BY '\n'*
> *    > STORED AS TEXTFILE;*
> *
> *
> *hive> INSERT OVERWRITE TABLE facts520_gzip_text SELECT * from
> facts520_normal_text;*
>
>
> When I run the above queries, the MR job fails.
>
> The error that the Hive CLI itself shows is the following:
>
> *Total MapReduce jobs = 3*
> *Launching Job 1 out of 3*
> *Number of reduce tasks is set to 0 since there's no reduce operator*
> *Starting Job = job_201306051948_0010, Tracking URL =
> http://aana1.ird.com:50030/jobdetails.jsp?jobid=job_201306051948_0010*
> *Kill Command = /usr/lib/hadoop/bin/hadoop job  -kill
> job_201306051948_0010*
> *Hadoop job information for Stage-1: number of mappers: 3; number of
> reducers: 0*
> *2013-06-05 21:09:42,281 Stage-1 map = 0%,  reduce = 0%*
> *2013-06-05 21:10:11,446 Stage-1 map = 100%,  reduce = 100%*
> *Ended Job = job_201306051948_0010 with errors*
> *Error during job, obtaining debugging information...*
> *Job Tracking URL:
> http://aana1.ird.com:50030/jobdetails.jsp?jobid=job_201306051948_0010*
> *Examining task ID: task_201306051948_0010_m_000004 (and more) from job
> job_201306051948_0010*
> *Examining task ID: task_201306051948_0010_m_000001 (and more) from job
> job_201306051948_0010*
> *
> *
> *Task with the most failures(4):*
> *-----*
> *Task ID:*
> *  task_201306051948_0010_m_000002*
> *
> *
> *URL:*
> *
> http://aana1.ird.com:50030/taskdetails.jsp?jobid=job_201306051948_0010&tipid=task_201306051948_0010_m_000002
> *
> *-----*
> *Diagnostic Messages for this Task:*
> *java.lang.RuntimeException:
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
> processing row
> {"fact_key":7549094,"products_key":205,"retailers_key":304,"suppliers_key":402,"time_key":103,"units":23}
> *
> *        at
> org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:161)*
> *        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)*
> *        at
> org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:418)*
> *        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:333)*
> *        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)*
> *        at java.security.AccessController.doPrivileged(Native Method)*
> *        at javax.security.auth.Subject.doAs(Subject.java:415)*
> *        at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
> *
> *        at org.apache.hadoop.mapred.Child.main(Child.java:262)*
> *Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive
> Runtime Error while processing row
> {"fact_key":7549094,"products_key":205,"retailers_key":304,"suppliers_key":402,"time_key":103,"units":23}
> *
> *        at org.apach*
> *
> *
> *FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.MapRedTask*
> *MapReduce Jobs Launched:*
> *Job 0: Map: 3   HDFS Read: 0 HDFS Write: 0 FAIL*
> *Total MapReduce CPU Time Spent: 0 msec*
>
>
> I'm unable to figure out why this is happening. It looks like the data is
> not being able to be copied properly.
> Or is it that GZip codec is not supported on textfiles?
>
> Any help in this issue is greatly appreciated!
>
> Thank you,
> Sachin
>
>
>

--90e6ba6e8b3014c86804de6c5e5f
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div><div>well...=A0=A0 the hiveException has the word &qu=
ot;metadata&quot; in it.=A0 maybe that&#39;s a hint or a red-herrring. :)=
=A0=A0=A0 Let&#39;s try the following:<br><br></div>1.=A0 show create table=
 <b><i> facts520_normal_text;<br>

<br></i></b></div><b><i>2.=A0 anything useful at this URL? </i></b><b><i>=
=A0 <a href=3D"http://aana1.ird.com:50030/taskdetails.jsp?jobid=3Djob_20130=
6051948_0010&amp;tipid=3Dtask_201306051948_0010_m_000002" target=3D"_blank"=
>http://aana1.ird.com:50030/taskdetails.jsp?jobid=3Djob_201306051948_0010&a=
mp;tipid=3Dtask_201306051948_0010_m_000002</a> or is it just the same stack=
 dump?<br>

<br><br></i></b></div><div class=3D"gmail_extra"><br><br><div class=3D"gmai=
l_quote">On Wed, Jun 5, 2013 at 3:17 AM, Sachin Sudarshana <span dir=3D"ltr=
">&lt;<a href=3D"mailto:sachin.hadoop@gmail.com" target=3D"_blank">sachin.h=
adoop@gmail.com</a>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><div dir=3D"ltr">Hi,<div><br></div><div>I ha=
ve hive 0.10 + (CDH 4.2.1 patches) installed on my cluster.</div><div><br><=
/div>

<div>I have a table facts520_normal_text stored as a textfile. I&#39;m tryi=
ng to create a compressed table from this table using GZip codec.</div>
<div><br></div><div><b><i>hive&gt; SET hive.exec.compress.output=3Dtrue;</i=
></b></div><div><b><i>hive&gt; SET mapred.output.compression.codec=3Dorg.ap=
ache.hadoop.io.compress.GZipCodec;</i></b></div><div><b><i>hive&gt; SET map=
red.output.compression.type=3DBLOCK;</i></b></div>


<div><b><i><br></i></b></div><div><div><b><i>hive&gt;</i></b></div><div><b>=
<i>=A0 =A0 &gt; Create table facts520_gzip_text</i></b></div><div><b><i>=A0=
 =A0 &gt; (fact_key BIGINT,</i></b></div><div><b><i>=A0 =A0 &gt; products_k=
ey INT,</i></b></div>


<div><b><i>=A0 =A0 &gt; retailers_key INT,</i></b></div><div><b><i>=A0 =A0 =
&gt; suppliers_key INT,</i></b></div><div><b><i>=A0 =A0 &gt; time_key INT,<=
/i></b></div><div><b><i>=A0 =A0 &gt; units INT)</i></b></div><div><b><i>=A0=
 =A0 &gt; ROW FORMAT DELIMITED FIELDS TERMINATED BY &#39;,&#39;</i></b></di=
v>


<div><b><i>=A0 =A0 &gt; LINES TERMINATED BY &#39;\n&#39;</i></b></div><div>=
<b><i>=A0 =A0 &gt; STORED AS TEXTFILE;</i></b></div></div><div><b><i><br></=
i></b></div><div><div><b><i>hive&gt; INSERT OVERWRITE TABLE facts520_gzip_t=
ext SELECT * from facts520_normal_text;</i></b></div>


</div><div><br></div><div><br></div><div>When I run the above queries, the =
MR job fails.</div><div><br></div><div>The error that the Hive CLI itself s=
hows is the following:</div><div><br></div>
<div><div><b><i>Total MapReduce jobs =3D 3</i></b></div><div><b><i>Launchin=
g Job 1 out of 3</i></b></div><div><b><i>Number of reduce tasks is set to 0=
 since there&#39;s no reduce operator</i></b></div><div><b><i>Starting Job =
=3D job_201306051948_0010, Tracking URL =3D <a href=3D"http://aana1.ird.com=
:50030/jobdetails.jsp?jobid=3Djob_201306051948_0010" target=3D"_blank">http=
://aana1.ird.com:50030/jobdetails.jsp?jobid=3Djob_201306051948_0010</a></i>=
</b></div>


<div><b><i>Kill Command =3D /usr/lib/hadoop/bin/hadoop job =A0-kill job_201=
306051948_0010</i></b></div><div><b><i>Hadoop job information for Stage-1: =
number of mappers: 3; number of reducers: 0</i></b></div><div><b><i>2013-06=
-05 21:09:42,281 Stage-1 map =3D 0%, =A0reduce =3D 0%</i></b></div>


<div><b><i>2013-06-05 21:10:11,446 Stage-1 map =3D 100%, =A0reduce =3D 100%=
</i></b></div><div><b><i>Ended Job =3D job_201306051948_0010 with errors</i=
></b></div><div><b><i>Error during job, obtaining debugging information...<=
/i></b></div>


<div><b><i>Job Tracking URL: <a href=3D"http://aana1.ird.com:50030/jobdetai=
ls.jsp?jobid=3Djob_201306051948_0010" target=3D"_blank">http://aana1.ird.co=
m:50030/jobdetails.jsp?jobid=3Djob_201306051948_0010</a></i></b></div><div>=
<b><i>Examining task ID: task_201306051948_0010_m_000004 (and more) from jo=
b job_201306051948_0010</i></b></div>


<div><b><i>Examining task ID: task_201306051948_0010_m_000001 (and more) fr=
om job job_201306051948_0010</i></b></div><div><b><i><br></i></b></div><div=
><b><i>Task with the most failures(4):</i></b></div><div><b><i>-----</i></b=
></div>


<div><b><i>Task ID:</i></b></div><div><b><i>=A0 task_201306051948_0010_m_00=
0002</i></b></div><div><b><i><br></i></b></div><div><b><i>URL:</i></b></div=
><div><b><i>=A0 <a href=3D"http://aana1.ird.com:50030/taskdetails.jsp?jobid=
=3Djob_201306051948_0010&amp;tipid=3Dtask_201306051948_0010_m_000002" targe=
t=3D"_blank">http://aana1.ird.com:50030/taskdetails.jsp?jobid=3Djob_2013060=
51948_0010&amp;tipid=3Dtask_201306051948_0010_m_000002</a></i></b></div>


<div><b><i>-----</i></b></div><div><b><i>Diagnostic Messages for this Task:=
</i></b></div><div><b><i>java.lang.RuntimeException: org.apache.hadoop.hive=
.ql.metadata.HiveException: Hive Runtime Error while processing row {&quot;=
fact_key&quot;:7549094,&quot;products_key&quot;:205,&quot;retailers_key&quo=
t;:304,&quot;suppliers_key&quot;:402,&quot;time_key&quot;:103,&quot;units&q=
uot;:23}</i></b></div>


<div><b><i>=A0 =A0 =A0 =A0 at org.apache.hadoop.hive.ql.exec.ExecMapper.map=
(ExecMapper.java:161)</i></b></div><div><b><i>=A0 =A0 =A0 =A0 at org.apache=
.hadoop.mapred.MapRunner.run(MapRunner.java:50)</i></b></div><div><b><i>=A0=
 =A0 =A0 =A0 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:=
418)</i></b></div>


<div><b><i>=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.MapTask.run(MapTask.=
java:333)</i></b></div><div><b><i>=A0 =A0 =A0 =A0 at org.apache.hadoop.mapr=
ed.Child$4.run(Child.java:268)</i></b></div><div><b><i>=A0 =A0 =A0 =A0 at j=
ava.security.AccessController.doPrivileged(Native Method)</i></b></div>


<div><b><i>=A0 =A0 =A0 =A0 at javax.security.auth.Subject.doAs(Subject.java=
:415)</i></b></div><div><b><i>=A0 =A0 =A0 =A0 at org.apache.hadoop.security=
.UserGroupInformation.doAs(UserGroupInformation.java:1408)</i></b></div><di=
v><b><i>=A0 =A0 =A0 =A0 at org.apache.hadoop.mapred.Child.main(Child.java:2=
62)</i></b></div>


<div><b><i>Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hiv=
e Runtime Error while processing row {&quot;fact_key&quot;:7549094,&quot;pr=
oducts_key&quot;:205,&quot;retailers_key&quot;:304,&quot;suppliers_key&quot=
;:402,&quot;time_key&quot;:103,&quot;units&quot;:23}</i></b></div>


<div><b><i>=A0 =A0 =A0 =A0 at org.apach</i></b></div><div><b><i><br></i></b=
></div><div><b><i>FAILED: Execution Error, return code 2 from org.apache.ha=
doop.hive.ql.exec.MapRedTask</i></b></div><div><b><i>MapReduce Jobs Launche=
d:</i></b></div>


<div><b><i>Job 0: Map: 3 =A0 HDFS Read: 0 HDFS Write: 0 FAIL</i></b></div><=
div><b><i>Total MapReduce CPU Time Spent: 0 msec</i></b></div><div><br></di=
v><div><br></div><div>I&#39;m unable to figure out why this is happening. I=
t looks like the data is not being able to be copied properly.</div>


<div>Or is it that GZip codec is not supported on textfiles?</div><div><br>=
</div><div>Any help in this issue is greatly appreciated!</div><div><br></d=
iv><div>Thank you,</div><div>Sachin</div>
<div><br></div></div><div><br></div></div>
</blockquote></div><br></div>

--90e6ba6e8b3014c86804de6c5e5f--