hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nitin Pawar <nitinpawar...@gmail.com>
Subject Re: 答复: 答复: hive task fails when left semi join
Date Tue, 16 Jul 2013 09:31:14 GMT
Kira,

What version of hadoop are you using? The error exit with status code 126
is very rare condition. you may refer  to
(MAPREDUCE-4857<https://issues.apache.org/jira/browse/MAPREDUCE-4857>
,  MAPREDUCE-2374 <https://issues.apache.org/jira/browse/MAPREDUCE-2374>)

There are multiple possibilities for which this error comes but in most of
them if an attempt has failed hadoop tries to schedule the attempt on the
next node.
There is very little info available to make any sense out of this for me.
May be experts will be able to tell more on the detailed error

Didn't the failed attempt get launched again for few more times?
Sorry couldn't be of much help with this as I do not have enough log for my
understanding levels.


On Tue, Jul 16, 2013 at 2:35 PM, <kira.wang@xiaoi.com> wrote:

> Nitin,****
>
> ** **
>
> I check the log of failed task in corresponding machine, the stderr are
> like this,****
>
> ** **
>
> 2013-07-16 16:19:00,057 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201307041810_0142_m_000015_0
> task's state:UNASSIGNED****
>
> 2013-07-16 16:19:00,058 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201307041810_0142_m_000015_0 which needs 1 slots****
>
> 2013-07-16 16:19:00,058 INFO org.apache.hadoop.mapred.TaskTracker: In
> TaskLauncher, current free slots : 2 and trying to launch
> attempt_201307041810_0142_m_000015_0 which needs 1 slots****
>
> 2013-07-16 16:19:01,082 INFO org.apache.hadoop.mapred.TaskController:
> Writing commands to
> /hadoop/tmp/mapred/local/ttprivate/taskTracker/root/jobcache/job_201307041810_0142/attempt_201307041810_0142_m_000015_0/taskjvm.sh
> ****
>
> 2013-07-16 16:19:02,011 WARN org.apache.hadoop.mapred.TaskRunner:
> attempt_201307041810_0142_m_000015_0 : Child Error****
>
> 2013-07-16 16:19:06,061 INFO org.apache.hadoop.mapred.TaskTracker:
> LaunchTaskAction (registerTask): attempt_201307041810_0142_m_000015_0
> task's state:FAILED_UNCLEAN****
>
> 2013-07-16 16:19:06,061 INFO org.apache.hadoop.mapred.TaskTracker: Trying
> to launch : attempt_201307041810_0142_m_000015_0 which needs 1 slots****
>
> 2013-07-16 16:19:06,061 INFO org.apache.hadoop.mapred.TaskTracker: In
> TaskLauncher, current free slots : 1 and trying to launch
> attempt_201307041810_0142_m_000015_0 which needs 1 slots****
>
> 2013-07-16 16:19:06,124 INFO org.apache.hadoop.mapred.TaskController:
> Writing commands to
> /hadoop/tmp/mapred/local/ttprivate/taskTracker/root/jobcache/job_201307041810_0142/attempt_201307041810_0142_m_000015_0.cleanup/taskjvm.sh
> ****
>
> 2013-07-16 16:19:09,845 INFO org.apache.hadoop.mapred.TaskTracker: JVM
> with ID: jvm_201307041810_0142_m_-1660811086 given task:
> attempt_201307041810_0142_m_000015_0****
>
> 2013-07-16 16:19:13,456 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201307041810_0142_m_000015_0 0.0%****
>
> 2013-07-16 16:19:16,052 INFO org.apache.hadoop.mapred.TaskTracker:
> attempt_201307041810_0142_m_000015_0 0.0% cleanup****
>
> 2013-07-16 16:19:16,053 INFO org.apache.hadoop.mapred.TaskTracker: Task
> attempt_201307041810_0142_m_000015_0 is done.****
>
> 2013-07-16 16:19:16,053 INFO org.apache.hadoop.mapred.TaskTracker:
> reported output size for attempt_201307041810_0142_m_000015_0  was -1****
>
> ** **
>
> From the Web UI:****
>
> ****
>
> ** **
>
> **1.       **Am I make the stderr clear?****
>
> **2.       **If so, how do your regard the error?****
>
> ** **
>
> ** **
>
> *发件人:* Nitin Pawar [mailto:nitinpawar432@gmail.com]
> *发送时间:* 2013年7月16日 16:44
> *收件人:* user@hadoop.apache.org
> *主题:* Re: 答复: hive task fails when left semi join****
>
> ** **
>
> Kira,****
>
> ** **
>
> I think the job got completed successfully. If a task has failed on one
> tasktracker hadoop takes care of rescheduling it to another for # number of
> retries.****
>
> I see the job status as 243/243 completed. ****
>
> ** **
>
> can you confirm once if your job has failed and if it has failed can you
> please share the stderr log for that particular task only ****
>
> failed to report so killed tasks you can ignore for now ****
>
> ** **
>
> On Tue, Jul 16, 2013 at 2:03 PM, <kira.wang@xiaoi.com> wrote:****
>
>  ****
>
> Nitin,****
>
>  ****
>
> Thanks for your carefully replay.****
>
>  ****
>
> The hive version used currently is 0.10.0, I find the configuration item
> you have said.****
>
> ****
>
>  ****
>
> I am using the map join method to filter out the data, it works quite well.
> ****
>
> ****
>
>  ****
>
> About the errors without using the map join method:****
>
>  ****
>
> [one of DNs]****
>
>  ****
>
> 2013-07-16 00:05:31,294 WARN org.apache.hadoop.mapred.TaskTracker:
> getMapOutput(attempt_201307041810_0138_m_000259_0,53) failed :****
>
> org.mortbay.jetty.EofException: timeout****
>
>          at
> org.mortbay.jetty.AbstractGenerator$Output.blockForOutput(AbstractGenerator.java:548)
> ****
>
>          at
> org.mortbay.jetty.AbstractGenerator$Output.flush(AbstractGenerator.java:572)
> ****
>
>          at
> org.mortbay.jetty.HttpConnection$Output.flush(HttpConnection.java:1012)***
> *
>
>          at
> org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:651)
> ****
>
>          at
> org.mortbay.jetty.AbstractGenerator$Output.write(AbstractGenerator.java:580)
> ****
>
>          at
> org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:3916)
> ****
>
>          at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)**
> **
>
>          at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)**
> **
>
>          at
> org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)****
>
>          at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
> ****
>
>          at
> org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
> ****
>
>          at
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> ****
>
>          at
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)**
> **
>
>          at
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> ****
>
>          at
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)**
> **
>
>          at
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)**
> **
>
>          at
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)****
>
>          at
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> ****
>
>          at
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)**
> **
>
>          at org.mortbay.jetty.Server.handle(Server.java:326)****
>
>          at
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)***
> *
>
>          at
> org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
> ****
>
>          at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)***
> *
>
>          at
> org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)****
>
>          at
> org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)****
>
>          at
> org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
> ****
>
>          at
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> ****
>
>  ****
>
> [NN]****
>
>  ****
>
> 2013-07-16 00:07:31,145 INFO org.apache.hadoop.mapred.TaskInProgress:
> Error from attempt_201307041810_0138_r_000053_1: Task
> attempt_201307041810_0138_r_000053_1 failed to report status for 601
> seconds. Killing!****
>
>  ****
>
>  ****
>
>  ****
>
> *发件人:* Nitin Pawar [mailto:nitinpawar432@gmail.com] ****
>
> *发送时间:* 2013年7月16日 15:52
> *收件人:* user@hadoop.apache.org
> *主题:* Re: hive task fails when left semi join****
>
>  ****
>
> Dev, ****
>
>  ****
>
> from what I learned in my past exp with running huge one table queries is
> one hits reduce side memory limits or timeout limits. I will wait for Kira
> to give more details on the same.****
>
> sorry i forgot to ask for the logs and suggested a different approach :( *
> ***
>
>  ****
>
> Kira, ****
>
> Page is in chinese so can't make much out of it but the query looks like
> map join. ****
>
> If you are using older hive version ****
>
> then the query showed on the mail thread looks good ****
>
>  ****
>
> if you are using new hive version then ****
>
>  hive.auto.convert.join=true will do the job ****
>
>  ****
>
> On Tue, Jul 16, 2013 at 1:07 PM, Devaraj k <devaraj.k@huawei.com> wrote:**
> **
>
> Hi,****
>
>    In the given image, I see there are some failed/killed map& reduce task
> attempts. Could you check why these are failing, you can check further
> based on the fail/kill reason.****
>
>  ****
>
>  ****
>
> Thanks****
>
> Devaraj k****
>
>  ****
>
> *From:* kira.wang@xiaoi.com [mailto:kira.wang@xiaoi.com]
> *Sent:* 16 July 2013 12:57
> *To:* user@hadoop.apache.org
> *Subject:* hive task fails when left semi join****
>
>  ****
>
> Hello,****
>
>  ****
>
> I am trying to filter out some records in a table in hive.****
>
> The number of lines in this table is 4billions+, ****
>
> I make a left semi join between above table and a small table with 1k
> lines.****
>
>  ****
>
> However, after 3 hours job running, it turns out a fail status.****
>
>  ****
>
> My question are as follows,****
>
> 1.     How could I address this problem and final solve it?****
>
> 2.     Is there any other good methods could filter out records with give
> conditions?****
>
>  ****
>
> The following picture is a snapshot of the failed job.****
>
> ****
>
>  ****
>
>
>
> ****
>
>  ****
>
> --
> Nitin Pawar****
>
>
>
> ****
>
> ** **
>
> --
> Nitin Pawar****
>



-- 
Nitin Pawar

Mime
View raw message