hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From <yogesh.kuma...@wipro.com>
Subject RE: Hive Query
Date Tue, 24 Jul 2012 10:24:20 GMT
hello Bejoy,

I have checked the logs of failour nodes on TT web Interface,

it is.

2012-07-24 15:38:45,415 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: Initializing JVM Metrics
with processName=SHUFFLE, sessionId=
2012-07-24 15:38:45,554 INFO org.apache.hadoop.mapred.ReduceTask: ShuffleRamManager: MemoryLimit=144965632,
MaxSingleShuffleLimit=36241408
2012-07-24 15:38:45,562 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201207241536_0002_r_000000_3
Thread started: Thread for merging in memory files
2012-07-24 15:38:45,562 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201207241536_0002_r_000000_3
Thread started: Thread for merging on-disk files
2012-07-24 15:38:45,563 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201207241536_0002_r_000000_3
Thread waiting: Thread for merging on-disk files
2012-07-24 15:38:45,564 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201207241536_0002_r_000000_3
Need another 1 map output(s) where 0 is already in progress
2012-07-24 15:38:45,564 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201207241536_0002_r_000000_3
Thread started: Thread for polling Map Completion Events
2012-07-24 15:38:45,565 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201207241536_0002_r_000000_3
Scheduled 0 outputs (0 slow hosts and0 dup hosts)
2012-07-24 15:38:45,569 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201207241536_0002_r_000000_3:
Got 1 new map-outputs
2012-07-24 15:38:50,565 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201207241536_0002_r_000000_3
Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2012-07-24 15:38:50,632 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201207241536_0002_r_000000_3
copy failed: attempt_201207241536_0002_m_000000_0 from 10.203.33.81
2012-07-24 15:38:50,634 WARN org.apache.hadoop.mapred.ReduceTask: java.io.IOException: Server
returned HTTP response code: 407 for URL: http://10.203.33.81:50060/mapOutput?job=job_201207241536_0002&map=attempt_201207241536_0002_m_000000_0&reduce=0
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
Caused by: java.io.IOException: Server returned HTTP response code: 407 for URL: http://10.203.33.81:50060/mapOutput?job=job_201207241536_0002&map=attempt_201207241536_0002_m_000000_0&reduce=0
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
    ... 4 more

2012-07-24 15:38:50,635 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201207241536_0002_r_000000_3:
Failed fetch #1 from attempt_201207241536_0002_m_000000_0
2012-07-24 15:38:50,635 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201207241536_0002_r_000000_3
adding host 10.203.33.81 to penalty box, next contact in 4 seconds
2012-07-24 15:38:50,635 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201207241536_0002_r_000000_3:
Got 1 map-outputs from previous failures
2012-07-24 15:38:55,635 INFO org.apache.hadoop.mapred.ReduceTask: attempt_201207241536_0002_r_000000_3
Scheduled 1 outputs (0 slow hosts and0 dup hosts)
2012-07-24 15:38:55,689 WARN org.apache.hadoop.mapred.ReduceTask: attempt_201207241536_0002_r_000000_3
copy failed: attempt_201207241536_0002_m_000000_0 from 10.203.33.81
2012-07-24 15:38:55,689 WARN org.apache.hadoop.mapred.ReduceTask: java.io.IOException: Server
returned HTTP response code: 407 for URL: http://10.203.33.81:50060/mapOutput?job=job_201207241536_0002&map=attempt_201207241536_0002_m_000000_0&reduce=0
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
Caused by: java.io.IOException: Server returned HTTP response code: 407 for URL: http://10.203.33.81:50060/mapOutput?job=job_201207241536_0002&map=attempt_201207241536_0002_m_000000_0&reduce=0
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
    ... 4 more

2012-07-24 15:38:55,690 INFO org.apache.hadoop.mapred.ReduceTask: Task attempt_201207241536_0002_r_000000_3:
Failed fetch #2 from attempt_201207241536_0002_m_000000_0
2012-07-24 15:38:55,690 INFO org.apache.hadoop.mapred.ReduceTask: Failed to fetch map-output
from attempt_201207241536_0002_m_000000_0 even after MAX_FETCH_RETRIES_PER_MAP retries...
 reporting to the JobTracker
2012-07-24 15:38:55,690 FATAL org.apache.hadoop.mapred.ReduceTask: Shuffle failed with too
many fetch failures and insufficient progress!Killing task attempt_201207241536_0002_r_000000_3.

Please have a look and please help.

Thanks & Regards
Yogesh Kumar

________________________________
From: Bejoy Ks [bejoy_ks@yahoo.com]
Sent: Tuesday, July 24, 2012 3:10 PM
To: user@hive.apache.org
Subject: Re: Hive Query

Hi Yogesh

I'm not exactly sure of the real root cause of the error.
>From the error log and the nature of occurrence. I suspect it could be happening when
the reduce task is not able to reach the map task node and fetch the map output. Something
close to fetch failures. Can you try out the following and see whether it does make some difference
1. Increase the value of tasktracker.http.threads  (this to be done at TT level and not on
job level, restart TT)
2. mapred.reduce.parallel.copies


The query, I just tested it out on my local environment, It is working fine and returned the
desired output. Looks like the root cause at your end is  some hadoop mis configuration as
most of the issues are mostly with Map reduce jobs.

Regards
Bejoy KS


________________________________
From: "yogesh.kumar13@wipro.com" <yogesh.kumar13@wipro.com>
To: user@hive.apache.org; bejoy_ks@yahoo.com
Sent: Tuesday, July 24, 2012 2:56 PM
Subject: RE: Hive Query

Thanks Bejoy :-)

I have an error Issue with

select count(*) from table;

it throws error

2012-07-24 13:39:25,181 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201207231123_0011 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201207231123_0011_m_000002 (and more) from job job_201207231123_0011
Exception in thread "Thread-93" java.lang.RuntimeException: Error while reading from task
log url
    at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
    at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
    at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
    at java.lang.Thread.run(Thread.java:680)
Caused by: java.io.IOException: Server returned HTTP response code: 407 for URL: http://10.203.33.81:50060/tasklog?taskid=attempt_201207231123_0011_r_000000_0&start=-8193
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
    at java.net.URL.openStream(URL.java:1010)
    at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
    ... 3 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   HDFS Read: 24 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec



and I run query

SELECT count(*),sub.name FROM (Select * FROM sitealias JOIN site ON (sitealias.site_id = site.site_id)
) sub GROUP BY sub.name;

it went into loop and still Map-Reduce process going on.

Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201207231123_0018, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201207231123_0018
Kill Command = /HADOOP/hadoop-0.20.2/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001
-kill job_201207231123_0018
Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
2012-07-24 14:42:03,824 Stage-1 map = 0%,  reduce = 0%
2012-07-24 14:42:09,850 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:43:10,030 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:44:10,177 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:45:10,358 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:46:10,516 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:47:10,672 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:48:10,882 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:49:11,016 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:50:11,152 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:51:11,409 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:52:11,550 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:53:11,679 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:54:11,807 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:55:11,935 Stage-1 map = 100%,  reduce = 0%
2012-07-24 14:56:12,060 Stage-1 map = 100%,  reduce = 0%


from past 10 minutes and still on...


Please suggest and Help

Thanks & Regards
Yogesh Kumar

________________________________
From: Bejoy Ks [bejoy_ks@yahoo.com]
Sent: Tuesday, July 24, 2012 2:33 PM
To: user@hive.apache.org
Subject: Re: Hive Query

Hi Yogesh

Try out this query, it should work though it is little expensive

SELECT count(*),sub.name FROM (Select * FROM sitealias JOIN site ON (sitealias.site_id = site.site_id)
) sub GROUP BY sub.name;


Regards
Bejoy KS

________________________________
From: "yogesh.kumar13@wipro.com" <yogesh.kumar13@wipro.com>
To: user@hive.apache.org; bejoy_ks@yahoo.com
Sent: Tuesday, July 24, 2012 1:39 PM
Subject: RE: Hive Query

Hi Bejoy,

even If if perform count(*) operation on table it shows error,

select count(*) from dummysite;


Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = job_201207231123_0011, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201207231123_0011
Kill Command = /HADOOP/hadoop-0.20.2/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001
-kill job_201207231123_0011
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2012-07-24 13:38:18,928 Stage-1 map = 0%,  reduce = 0%
2012-07-24 13:38:21,938 Stage-1 map = 100%,  reduce = 0%
2012-07-24 13:39:22,170 Stage-1 map = 100%,  reduce = 0%
2012-07-24 13:39:25,181 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_201207231123_0011 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201207231123_0011_m_000002 (and more) from job job_201207231123_0011
Exception in thread "Thread-93" java.lang.RuntimeException: Error while reading from task
log url
    at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:130)
    at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:211)
    at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:81)
    at java.lang.Thread.run(Thread.java:680)
Caused by: java.io.IOException: Server returned HTTP response code: 407 for URL: http://10.203.33.81:50060/tasklog?taskid=attempt_201207231123_0011_r_000000_0&start=-8193
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1436)
    at java.net.URL.openStream(URL.java:1010)
    at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getErrors(TaskLogProcessor.java:120)
    ... 3 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1  Reduce: 1   HDFS Read: 24 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec


Please suggest why this error is comming :-(

Regards
Yogesh Kumar

________________________________
From: Bejoy KS [bejoy_ks@yahoo.com]
Sent: Tuesday, July 24, 2012 12:52 PM
To: user@hive.apache.org
Subject: Re: Hive Query


Hi Yogesh

Can you try out this?

select count(*), site.name from sitealias join site on (site_alias.site_id=site.site_id) Group
By site.name;

Regards
Bejoy KS

Sent from handheld, please excuse typos.
________________________________
From: <yogesh.kumar13@wipro.com>
Date: Tue, 24 Jul 2012 07:14:25 +0000
To: <user@hive.apache.org>
ReplyTo: user@hive.apache.org
Subject: Hive Query

Hi all,

I have two tables
1) sitealias
2) site


sitealias contains
-------------------------
id                   site_id
----------------------------
1                        15
2                        12
3                        12
4                        15
---------------------------

site contains

-----------------------------
site_id                        name
-------------------------------
12                        google
13                        wiki
14                        yahoo
15                        flipcart
---------------------------------



I am runing a query to perform equi join and to result  how many times same site_id repeats
and its name and it gets group bi site id.

result of query I want

---------------------------------
site_id                name
---------------------------------
2                        google
2                        flipcart
----------------------------------


I performed
select sitealias.count(*), site.name from sitealias join site on (site_alias.site_id=site.site_id);

it shows error :  Parse Error:  mismatched input '(' expecting FROM near 'count' in from clause


Please help and suggest a query of this kind of operations.


Thanks & Regards
Yogesh Kumar
Please do not print this email unless it is absolutely necessary.
The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments.
WARNING: Computer viruses can be transmitted via email. The recipient should check this email
and any attachments for the presence of viruses. The company accepts no liability for any
damage caused by any virus transmitted by this email.
www.wipro.com
Please do not print this email unless it is absolutely necessary.
The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments.
WARNING: Computer viruses can be transmitted via email. The recipient should check this email
and any attachments for the presence of viruses. The company accepts no liability for any
damage caused by any virus transmitted by this email.
www.wipro.com


Please do not print this email unless it is absolutely necessary.
The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments.
WARNING: Computer viruses can be transmitted via email. The recipient should check this email
and any attachments for the presence of viruses. The company accepts no liability for any
damage caused by any virus transmitted by this email.
www.wipro.com



Please do not print this email unless it is absolutely necessary. 

The information contained in this electronic message and any attachments to this message are
intended for the exclusive use of the addressee(s) and may contain proprietary, confidential
or privileged information. If you are not the intended recipient, you should not disseminate,
distribute or copy this e-mail. Please notify the sender immediately and destroy all copies
of this message and any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should check this email
and any attachments for the presence of viruses. The company accepts no liability for any
damage caused by any virus transmitted by this email. 

www.wipro.com

Mime
View raw message