hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ian Athanasakis (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3357) delete on dfs hung
Date Tue, 24 Jun 2008 17:20:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12607687#action_12607687
] 

Ian Athanasakis commented on HADOOP-3357:
-----------------------------------------

Randomly happened to me too.

java.net.SocketTimeoutException: timed out waiting for rpc response
	at org.apache.hadoop.ipc.Client.call(Client.java:559)
	at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:212)
	at org.apache.hadoop.dfs.$Proxy1.delete(Unknown Source)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
	at org.apache.hadoop.dfs.$Proxy1.delete(Unknown Source)
	at org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:524)
	at org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:162)
	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigOutputFormat$PigRecordWriter.(PigOutputFormat.java:84)
	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigOutputFormat.getRecordWriter(PigOutputFormat.java:70)
	at org.apache.pig.backend.hadoop.executionengine.mapreduceExec.PigOutputFormat.getRecordWriter(PigOutputFormat.java:49)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:366)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2124)

Currently 75% of the DFS is used, and there are 53 free nodes. It seems like this shouldn't
be happening.

> delete on dfs hung
> ------------------
>
>                 Key: HADOOP-3357
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3357
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Devaraj Das
>
> I had a case where the JobTracker was trying to delete some files, as part of Garbage
Collect for a job, in a dfs directory. The thread hung and this is the trace:
> Thread 19 (IPC Server handler 5 on 57344):
>   State: WAITING
>   Blocked count: 137022
>   Waited count: 336004
>   Waiting on org.apache.hadoop.ipc.Client$Call@eb6238
>   Stack:
>     java.lang.Object.wait(Native Method)
>     java.lang.Object.wait(Object.java:485)
>     org.apache.hadoop.ipc.Client.call(Client.java:683)
>     org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>     org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source)
>     sun.reflect.GeneratedMethodAccessor10.invoke(Unknown Source)
>     sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>     java.lang.reflect.Method.invoke(Method.java:597)
>     org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>     org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>     org.apache.hadoop.dfs.$Proxy4.delete(Unknown Source)
>     org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:515)
>     org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:170)
>     org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:118)
>     org.apache.hadoop.fs.FileUtil.fullyDelete(FileUtil.java:114)
>     org.apache.hadoop.mapred.JobInProgress.garbageCollect(JobInProgress.java:1635)
>     org.apache.hadoop.mapred.JobInProgress.isJobComplete(JobInProgress.java:1387)
>     org.apache.hadoop.mapred.JobInProgress.completedTask(JobInProgress.java:1348)
>     org.apache.hadoop.mapred.JobInProgress.updateTaskStatus(JobInProgress.java:565)
>     org.apache.hadoop.mapred.JobTracker.updateTaskStatuses(JobTracker.java:2032)
> and it hung for an enormously long amount of time ~1 hour. 
> Not sure whether these will help:
> I saw this message in the NameNode log around the time the delete was issued by the JobTracker
> 2008-05-07 09:55:57,375 WARN org.apache.hadoop.dfs.StateChange: DIR* FSDirectory.unprotectedDelete:
failed to remove /mapredsystem/ddas/mapredsystem/10091.gs301249.inktomisearch.com/job_200805070458_0004
because it does not exist
> I also checked that the directory in question was actually there (and the job couldn't
have run without this directory being there).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message