Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0425E10D16 for ; Mon, 3 Mar 2014 04:49:09 +0000 (UTC) Received: (qmail 55615 invoked by uid 500); 3 Mar 2014 04:49:06 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 55180 invoked by uid 500); 3 Mar 2014 04:49:04 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 55170 invoked by uid 99); 3 Mar 2014 04:49:02 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Mar 2014 04:49:02 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of siddharth.tiwari@live.com designates 65.55.90.157 as permitted sender) Received: from [65.55.90.157] (HELO snt0-omc3-s18.snt0.hotmail.com) (65.55.90.157) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 03 Mar 2014 04:48:57 +0000 Received: from SNT148-W41 ([65.55.90.135]) by snt0-omc3-s18.snt0.hotmail.com with Microsoft SMTPSVC(6.0.3790.4675); Sun, 2 Mar 2014 20:48:36 -0800 X-TMN: [9vyq2t4liUD5ZDMVz7NHTkbtfZRZg06P] X-Originating-Email: [siddharth.tiwari@live.com] Message-ID: Content-Type: multipart/alternative; boundary="_f7ab1bc6-bbde-4c60-a7eb-bbc29202baff_" From: Siddharth Tiwari To: hive user list , "sanjaysubramanian@yahoo.com" Subject: RE: Query hangs at 99.97 % for one reducer in Hive Date: Mon, 3 Mar 2014 04:48:36 +0000 Importance: Normal In-Reply-To: References: , MIME-Version: 1.0 X-OriginalArrivalTime: 03 Mar 2014 04:48:36.0675 (UTC) FILETIME=[D6B06930:01CF369B] X-Virus-Checked: Checked by ClamAV on apache.org --_f7ab1bc6-bbde-4c60-a7eb-bbc29202baff_ Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable Hi Sanjay=2Chere is teh detail Even 500 reducers sounds a high number but I don't know the deatils of your= cluster. Can u=0A= provide some details=0A= How many nodes in cluster : 21 Nodes=0A= Hive version : Hive-0.10.x=0A= Which distribution (Hortonworks=2C Apache=2C CDH=2C Amazon): CDH 4.3=0A= Node specs: Each node 64 cores ( with HT)=2C 128 GB RAM=2C 3*10 TB disks=0A= Partitions in the table: none =0A= Number of records: around 17 Billion Also=2CI see following exception thrown by hanging task=2C I have no idea w= hat does this mean java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveExceptio= n: Hive Runtime Error while processing row (tag=3D1) {"key":{"joinkey0":""}= =2C"value":{"_col2":"92"=2C"_col11":"-60-01-21=2C00"=2C"_col12":"-03-07-04= =2C00"}=2C"alias":1}=0A= at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:270)= =0A= at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)= =0A= at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)=0A= at org.apache.hadoop.mapred.Child$4.run(Child.java:268)=0A= at java.security.AccessController.doPrivileged(Native Method)=0A= at javax.security.auth.Subject.doAs(Subject.java:415)=0A= at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati= on.java:1408)=0A= at org.apache.hadoop.mapred.Child.main(Child.java:262)=0A= Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime E= rror while processing row (tag=3D1) {"key":{"joinkey0":""}=2C"value":{"_col= 2":"92"=2C"_col11":"-60-01-21=2C00"=2C"_col12":"-03-07-04=2C00"}=2C"alias":= 1}=0A= at org.apache.hadoop.hive.ql.exec.ExecRedu *------------------------* =0A= Cheers !!! =0A= Siddharth Tiwari =0A= Have a refreshing day !!! "Every duty is holy=2C and devotion to duty is the highest form of worship = of God.=94=20 =0A= "Maybe other people will try to limit me but I don't limit myself" From: siddharth.tiwari@live.com To: user@hadoop.apache.org Subject: FW: Query hangs at 99.97 % for one reducer in Hive Date: Sun=2C 2 Mar 2014 23:12:47 +0000 =0A= =0A= =0A= Forwarding message to hadoop list as well for any help. Appreciate any help *------------------------* =0A= Cheers !!! =0A= Siddharth Tiwari =0A= Have a refreshing day !!! "Every duty is holy=2C and devotion to duty is the highest form of worship = of God.=94=20 =0A= "Maybe other people will try to limit me but I don't limit myself" From: siddharth.tiwari@live.com To: user@hive.apache.org Subject: Query hangs at 99.97 % for one reducer in Hive Date: Sun=2C 2 Mar 2014 23:09:25 +0000 =0A= =0A= =0A= Hi team=2C following query hangs at 99.97% for one reducer=2C kindly help or point to = what can be cause drop table if exists sample.dpi_short_lt=3Bcreate table sample.dpi_short_lt= asselect b.msisdn=2C a.area_erb=2C = a.longitude=2C a.latitude=2C = substring(b.msisdn=2C1=2C2) as country=2C = substring(b.msisdn=2C3=2C2) as area_code=2C = substring(b.start_time=2C1=2C4) as year=2C = substring(b.start_time=2C6=2C2) as month=2C = substring(b.start_time=2C9=2C2) as day=2C su= bstring(b.start_time=2C12=2C2) as hour=2C cas= t(b.procedure_duration as double) as duracao_ms=2C = cast(b.internet_latency as double) as int_internet_latency=2C = cast(b.ran_latency as double) as int_ran_latency=2C = cast(b.http_latency as double) as int_http_la= tency=2C (case when b.internet_latency=3D'' t= hen 1 else 0 end) as internet_latency_missing=2C = (case when b.ran_latency=3D'' then 1 else 0 end) as ran_latency_missing= =2C (case when b.http_latency=3D'' then 1 els= e 0 end) as http_latency_missing=2C (cast(b.m= ean_throughput_ul as int) * cast( procedure_duration as int) / 1000) as tot= al_up_bytes=2C (cast(b.mean_throughput_dl as = int) * cast(procedure_duration as int) / 1000) as total_dl_bytes=2C = cast(b.missing_packets_ul as int) as int_missing_pa= ckets_ul=2C cast(b.missing_packets_dl as int)= as int_missing_packets_dlfrom sample.dpi_large bleft outer join sample.sci= ence_new aon b.cgi =3D regexp_replace(a.codigo_cgi_ecgi=2C'-'=2C'')where ms= isdn!=3D''=3B Hive was heuristically selecting 1000 reducers and it was hanging at 99.97 = percent on one reduce task. I then changed the above values to 3GB per redu= cer and 500 reducers and started hitting this error. java.lang.RuntimeException: Hive Runtime Error while closing operators: Una= ble to rename output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-0= 1_03-14-36_812_8390586541316719852-1/_task_tmp.-ext-10001/_tmp.000003_0 to:= hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_83905865413= 16719852-1/_tmp.-ext-10001/000003_0=0A= at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:313)= =0A= at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:516)= =0A= at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)=0A= at org.apache.hadoop.mapred.Child$4.run(Child.java:268)=0A= at java.security.AccessController.doPrivileged(Native Method)=0A= at javax.security.auth.Subject.doAs(Subject.java:415)=0A= at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati= on.java:1408)=0A= at org.apache.hadoop.mapred.Child.main(Child.java:262)=0A= Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to rena= me output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_= 812 I have 22 node cluster running cdh 4.3. Please try to locate what can be te= h issue. *------------------------* =0A= Cheers !!! =0A= Siddharth Tiwari =0A= Have a refreshing day !!! "Every duty is holy=2C and devotion to duty is the highest form of worship = of God.=94=20 =0A= "Maybe other people will try to limit me but I don't limit myself" = --_f7ab1bc6-bbde-4c60-a7eb-bbc29202baff_ Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable
Hi Sanjay=2C
her= e is teh detail
Even 500 reduce=
rs sounds a high number but I don't know the deatils of your cluster. Can u=
=0A=
provide some details=0A=
How many nodes in cluster : 21 Nodes=0A=
Hive version : Hive-0.10.x=0A=
Which distribution (Hortonworks=2C Apache=2C CDH=2C Amazon): CDH 4.3=0A=
Node specs: Each node 64 cores ( with HT)=2C 128 GB RAM=2C 3*10 TB disks=0A=
Partitions in the table: none =0A=
Number of records: around 17 Billion

Also=2C
= I see following exception thrown by hanging task=2C I have no idea what doe= s this mean

java.lang.RuntimeException: org.apache.hado=
op.hive.ql.metadata.HiveException: Hive Runtime Error while processing row =
(tag=3D1) {"key":{"joinkey0":""}=2C"value":{"_col2":"92"=2C"_col11":"-60-01=
-21=2C00"=2C"_col12":"-03-07-04=2C00"}=2C"alias":1}=0A=
	at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:270)=
=0A=
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:506)=
=0A=
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)=0A=
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)=0A=
	at java.security.AccessController.doPrivileged(Native Method)=0A=
	at javax.security.auth.Subject.doAs(Subject.java:415)=0A=
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformati=
on.java:1408)=0A=
	at org.apache.hadoop.mapred.Child.main(Child.java:262)=0A=
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime E=
rror while processing row (tag=3D1) {"key":{"joinkey0":""}=2C"value":{"_col=
2":"92"=2C"_col11":"-60-01-21=2C00"=2C"_col12":"-03-07-04=2C00"}=2C"alias":=
1}=0A=
	at org.apache.hadoop.hive.ql.exec.ExecRedu

*------------------------*
=0A= = Cheers !!!
=0A= = Siddharth Tiwari=0A= Have a r= efreshing day !!!"Every duty is holy=2C and devotion to duty is the highest form of wors= hip of God.=94
=0A= "Maybe other people will try to limit me but I do= n't limit myself"



From: siddharth.tiwari@live.com<= br>To: user@hadoop.apache.org
Subject: FW: Query hangs at 99.97 % for on= e reducer in Hive
Date: Sun=2C 2 Mar 2014 23:12:47 +0000

=0A= =0A= =0A=
Forwarding message= to hadoop list as well for any help. Appreciate any help


*---------------------= ---*
=0A= = Cheers !!!
=0A= = Siddharth Tiwari=0A= Have a r= efreshing day !!!"Every duty is holy=2C and devotion to duty is the highest form of wors= hip of God.=94
=0A= "Maybe other people will try to limit me but I do= n't limit myself"



From: siddharth.tiwari@live.c= om
To: user@hive.apache.org
Subject: Query hangs at 99.97 % for one r= educer in Hive
Date: Sun=2C 2 Mar 2014 23:09:25 +0000

=0A= =0A= =0A=
Hi team=2C

following query hangs at 99.97% for one reducer=2C kindly help or point to= what can be cause

drop table if exis= ts sample.dpi_short_lt=3B
create table sample.dpi_short_lt = as
select  =3B  =3B  =3B  =3B  =3B &nbs= p=3B  =3B  =3B  =3B b.msisdn=2C
 =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B a.area_erb=2C
 =3B  =3B  =3B  =3B  =3B  =3B  =3B=  =3B a.longitude=2C
 =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B a.latitude=2C
 = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B=  =3B  =3B  =3B  =3B  =3B  =3B  =3Bsubstring(b.= msisdn=2C1=2C2) as country=2C
 =3B  =3B  =3B=  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B &n= bsp=3B  =3B  =3B  =3B  =3Bsubstring(b.msisdn=2C3=2C2) as ar= ea_code=2C
 =3B  =3B  =3B  =3B  =3B &nb= sp=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3Bsubstring(b.start_time=2C1=2C4) as year=2C
 =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B= substring(b.start_time=2C6=2C2) as month=2C
 =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B=  =3B  =3B  =3B  =3B  =3B  =3Bsubstring(b.start_tim= e=2C9=2C2) as day=2C
 =3B  =3B  =3B  =3B &n= bsp=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3Bsubstring(b.start_time=2C12=2C2) as hour=2C
 =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B=  =3Bcast(b.procedure_duration as double) as duracao_ms=2C
 =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B= cast(b.internet_latency as double) as int_internet_latency=2C
 =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B= cast(b.ran_latency as double) as int_ran_latency=2C
 = =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B=  =3B  =3B  =3B  =3B  =3B  =3B  =3Bcast(b.http_= latency as double) as int_http_latency=2C
 =3B  =3B=  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B &n= bsp=3B  =3B  =3B  =3B  =3B  =3B(case when b.internet_la= tency=3D'' then 1 else 0 end) as internet_latency_missing=2C
 =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B &= nbsp=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B(case= when b.ran_latency=3D'' then 1 else 0 end) as ran_latency_missing=2C
=
 =3B  =3B  =3B  =3B  =3B  =3B  =3B &= nbsp=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B &nbs= p=3B(case when b.http_latency=3D'' then 1 else 0 end) as http_latency_missi= ng=2C
 =3B  =3B  =3B  =3B  =3B  =3B=  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B &n= bsp=3B  =3B(cast(b.mean_throughput_ul as int) * cast( procedure_duratio= n as int) / 1000) as total_up_bytes=2C
 =3B  =3B &n= bsp=3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B  = =3B  =3B  =3B  =3B  =3B  =3B(cast(b.mean_throughput_dl = as int) * cast(procedure_duration as int)  =3B/ 1000) as total_dl_bytes= =2C
 =3B  =3B  =3B  =3B  =3B  =3B=  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B &n= bsp=3B  =3Bcast(b.missing_packets_ul as int) as int_missing_packets_ul= =2C
 =3B  =3B  =3B  =3B  =3B  =3B=  =3B  =3B  =3B  =3B  =3B  =3B  =3B  =3B &n= bsp=3B  =3Bcast(b.missing_packets_dl as int) as int_missing_packets_dl<= /div>
from =3Bsample.dpi_large b
left outer j= oin =3Bsample.science_new a
on b.cgi =3D regexp_replac= e(a.codigo_cgi_ecgi=2C'-'=2C'')
where msisdn!=3D''=3B<= /div>

Hive was heuristically selecting 1000 = reducers and it was =3Bhanging at 99.97 percent on one reduce task. I then changed the above values to 3GB per reducer and 500 reducers and= started hitting this error.

java.lang.RuntimeExceptio=
n: Hive Runtime Error while closing operators: Unable to rename output from=
: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_=
8390586541316719852-1/_task_tmp.-ext-10001/_tmp.000003_0 to: hdfs=
://tlvcluster/tmp/hive-hadoop/hive_2014-03-01_03-14-36_812_839058=
6541316719852-1/_tmp.-ext-10001/000003_0=0A=
	at org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.=
java:313)=0A=
	at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.=
java:516)=0A=
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:447)=
=0A=
	at org.apache.hadoop.mapred.Child$4.run(Child.java:268)=0A=
	at java.security.AccessController.doPrivileged(Native Method)=0A=
	at javax.security.auth.Subject.doAs(Subject.java:415)=0A=
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGrou=
pInformation.java:1408)=0A=
	at org.apache.hadoop.mapred.Child.main(Child.java:262)=0A=
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Unable to=
 rename output from: hdfs://tlvcluster/tmp/hive-hadoop/hive_2014-03-01=
_03-14-36_812


I have 22 node clu= ster running cdh 4.3. Please try to locate what can be teh issue.

=
*------------------------*
=0A= = Cheers !!!
=0A= = Siddharth Tiwari=0A= Have a r= efreshing day !!!"Every duty is holy=2C and devotion to duty is the highest form of wors= hip of God.=94
=0A= "Maybe other people will try to limit me but I do= n't limit myself"
= --_f7ab1bc6-bbde-4c60-a7eb-bbc29202baff_--