hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hua xu (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HADOOP-9684) The initialization may be missed for org.apache.ipc.Client$Connection
Date Tue, 02 Jul 2013 13:29:26 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-9684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Hua xu updated HADOOP-9684:
---------------------------

         Description: 
Today, we see that a TaskTracer has keeped throwing the same exception in our production environment.it
is that:

2013-07-01 18:41:40,023 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free
slots : 7
2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask):
attempt_201208241212_27521_m_000002_3 task's state:UNASSIGNED
2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201208241212_27521_m_000002_3
which needs 1 slots
2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current
free slots : 7 and trying to launch attempt_201208241212_27521_m_000002_3 which needs 1 slots
2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapreduce.server.tasktracker.Localizer: User-directories
for the user sds are already initialized on this TT. Not doing anything.
2013-07-01 18:41:43,029 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201208241212_27521_m_000002_3:
java.lang.NullPointerException

2013-07-01 18:41:43,029 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to set finish time
for task attempt_201208241212_27521_m_000002_3 when no start time is set, stackTrace is :
java.lang.Exception
	at org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:195)
	at org.apache.hadoop.mapred.MapTaskStatus.setFinishTime(MapTaskStatus.java:51)
	at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2937)
	at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2255)
	at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)

  Then, we view the log files of the TaskTracker,and find that the TaskTracker throwed Several
OutOfMemoryError: Java heap space about ten days ago. after that, the TaskTracker has still
throws the exception:


2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask):
attempt_201208241212_26088_m_000043_1 task's state:UNASSIGNED
2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201208241212_26088_m_000043_1
which needs 1 slots
2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current
free slots : 7 and trying to launch attempt_201208241212_26088_m_000043_1 which needs 1 slots
2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapreduce.server.tasktracker.Localizer: Initializing
user sds on this TT.
2013-06-22 12:39:42,300 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201208241212_26088_m_000043_1:
java.lang.NullPointerException
	at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:630)
	at org.apache.hadoop.ipc.Client.call(Client.java:886)
	at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
	at $Proxy5.getFileInfo(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
	at $Proxy5.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:850)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:620)
	at org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:3984)
	at org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:1036)
	at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:977)
	at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2247)
	at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)

2013-06-22 12:39:42,300 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to set finish time
for task attempt_201208241212_26088_m_000043_1 when no start time is set, stackTrace is :
java.lang.Exception
	at org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:195)
	at org.apache.hadoop.mapred.MapTaskStatus.setFinishTime(MapTaskStatus.java:51)
	at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2937)
	at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2255)
	at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
  
  Since then, the TaskTracker has not completed any task.we can find the property of Connection
-out is null from the above exeption,and it is caused by the failure of method setupIOstream()
in the class org.apache.ipc.Client$Connection.anyway, the instance of Connection is not null
and cached by the org.apache.ipc.Client.We guess that it throwed a OutOfMemoryError when a
thread called the setupIOstream() because of RPC, so that some properties of the Connection
are null, and throw NullPointerException when the Connection are accessed by other threads,which
is fron the cache.It must be correct to make sure that the instance of Connection could only
be cached after initialized successfully. We fixed the Method getConnection() to guarantee
the semantic:

private Connection getConnection(InetSocketAddress addr, Class<?> protocol, UserGroupInformation
ticket, Call call) throws IOException, InterruptedException {

  if (!running.get()) {
      // the client is stopped
      throw new IOException("The client is stopped");
    }

    Connection connection;
    ConnectionId remoteId = new ConnectionId(addr, protocol, ticket);
    do {
      synchronized (connections) {
        connection = connections.get(remoteId);
        if (connection == null) {

          connection = new Connection(remoteId);
          
           connection.setupIOstreams();
          
          connections.put(remoteId, connection);
        }
      }
    } while (!connection.addCall(call));
    

-   //connection.setupIOstreams();
    
    return connection;
  }

  On the other hand, we also simulate this scenario.Firstly, one thread create the Connection
instance and throw OutOfMemoryError when it call connection.setupIOstreams().after that, the
other thread start call RPC through by the instance and it keeps throwing the same exceptions


  was:
Today, we see that a TaskTracer has keeped throwing the same exception in our production environment.it
is that:

2013-07-01 18:41:40,023 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current free
slots : 7
2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask):
attempt_201208241212_27521_m_000002_3 task's state:UNASSIGNED
2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201208241212_27521_m_000002_3
which needs 1 slots
2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current
free slots : 7 and trying to launch attempt_201208241212_27521_m_000002_3 which needs 1 slots
2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapreduce.server.tasktracker.Localizer: User-directories
for the user sds are already initialized on this TT. Not doing anything.
2013-07-01 18:41:43,029 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201208241212_27521_m_000002_3:
java.lang.NullPointerException

2013-07-01 18:41:43,029 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to set finish time
for task attempt_201208241212_27521_m_000002_3 when no start time is set, stackTrace is :
java.lang.Exception
	at org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:195)
	at org.apache.hadoop.mapred.MapTaskStatus.setFinishTime(MapTaskStatus.java:51)
	at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2937)
	at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2255)
	at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)

  Then, we view the log files of the TaskTracker,and find that the TaskTracker throwed Several
OutOfMemoryError: Java heap space about ten days ago. after that, the TaskTracker has still
throws the exception:


2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask):
attempt_201208241212_26088_m_000043_1 task's state:UNASSIGNED
2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch : attempt_201208241212_26088_m_000043_1
which needs 1 slots
2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current
free slots : 7 and trying to launch attempt_201208241212_26088_m_000043_1 which needs 1 slots
2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapreduce.server.tasktracker.Localizer: Initializing
user sds on this TT.
2013-06-22 12:39:42,300 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing attempt_201208241212_26088_m_000043_1:
java.lang.NullPointerException
	at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:630)
	at org.apache.hadoop.ipc.Client.call(Client.java:886)
	at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
	at $Proxy5.getFileInfo(Unknown Source)
	at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
	at $Proxy5.getFileInfo(Unknown Source)
	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:850)
	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:620)
	at org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:3984)
	at org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:1036)
	at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:977)
	at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2247)
	at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)

2013-06-22 12:39:42,300 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to set finish time
for task attempt_201208241212_26088_m_000043_1 when no start time is set, stackTrace is :
java.lang.Exception
	at org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:195)
	at org.apache.hadoop.mapred.MapTaskStatus.setFinishTime(MapTaskStatus.java:51)
	at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2937)
	at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2255)
	at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
  
  Since then, the TaskTracker has not completed any task.we can find the property of Connection
-out is null from the above exeption,and it is caused by the failure of method setupIOstream()
in the class org.apache.ipc.Client$Connection.anyway, the instance of Connection is not null
and cached by the org.apache.ipc.Client.We guess that it throwed a OutOfMemoryError when a
thread called the setupIOstream() because of RPC, so that some properties of the Connection
are null, and throw NullPointerException when the Connection are accessed by other threads,which
is fron the cache.It must be correct to make sure that the instance of Connection could only
be cached after initialized successfully. We fixed the Method getConnection() to guarantee
the semantic:

private Connection getConnection(InetSocketAddress addr, Class<?> protocol, UserGroupInformation
ticket, Call call) throws IOException, InterruptedException {

  if (!running.get()) {
      // the client is stopped
      throw new IOException("The client is stopped");
    }

    Connection connection;
    ConnectionId remoteId = new ConnectionId(addr, protocol, ticket);
    do {
      synchronized (connections) {
        connection = connections.get(remoteId);
        if (connection == null) {
          connection = new Connection(remoteId);
          
         connection.setupIOstreams();
          
          connections.put(remoteId, connection);
        }
      }
    } while (!connection.addCall(call));
    

-   //connection.setupIOstreams();
    
    return connection;
  }

  On the other hand, we also simulate this scenario.Firstly, one thread create the Connection
instance and throw OutOfMemoryError when it call connection.setupIOstreams().after that, the
other thread start call RPC through by the instance and it keeps throwing the same exceptions


    Target Version/s: 0.21.0, 1.0.3  (was: 1.0.3, 0.21.0)
    
> The initialization may be missed for org.apache.ipc.Client$Connection
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-9684
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9684
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: ipc
>    Affects Versions: 1.0.3, 0.21.0
>            Reporter: Hua xu
>
> Today, we see that a TaskTracer has keeped throwing the same exception in our production
environment.it is that:
> 2013-07-01 18:41:40,023 INFO org.apache.hadoop.mapred.TaskTracker: addFreeSlot : current
free slots : 7
> 2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask):
attempt_201208241212_27521_m_000002_3 task's state:UNASSIGNED
> 2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch :
attempt_201208241212_27521_m_000002_3 which needs 1 slots
> 2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current
free slots : 7 and trying to launch attempt_201208241212_27521_m_000002_3 which needs 1 slots
> 2013-07-01 18:41:43,026 INFO org.apache.hadoop.mapreduce.server.tasktracker.Localizer:
User-directories for the user sds are already initialized on this TT. Not doing anything.
> 2013-07-01 18:41:43,029 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing
attempt_201208241212_27521_m_000002_3:
> java.lang.NullPointerException
> 2013-07-01 18:41:43,029 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to set finish
time for task attempt_201208241212_27521_m_000002_3 when no start time is set, stackTrace
is : java.lang.Exception
> 	at org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:195)
> 	at org.apache.hadoop.mapred.MapTaskStatus.setFinishTime(MapTaskStatus.java:51)
> 	at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2937)
> 	at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2255)
> 	at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
>   Then, we view the log files of the TaskTracker,and find that the TaskTracker throwed
Several OutOfMemoryError: Java heap space about ten days ago. after that, the TaskTracker
has still throws the exception:
> 2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: LaunchTaskAction (registerTask):
attempt_201208241212_26088_m_000043_1 task's state:UNASSIGNED
> 2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: Trying to launch :
attempt_201208241212_26088_m_000043_1 which needs 1 slots
> 2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapred.TaskTracker: In TaskLauncher, current
free slots : 7 and trying to launch attempt_201208241212_26088_m_000043_1 which needs 1 slots
> 2013-06-22 12:39:42,296 INFO org.apache.hadoop.mapreduce.server.tasktracker.Localizer:
Initializing user sds on this TT.
> 2013-06-22 12:39:42,300 WARN org.apache.hadoop.mapred.TaskTracker: Error initializing
attempt_201208241212_26088_m_000043_1:
> java.lang.NullPointerException
> 	at org.apache.hadoop.ipc.Client$Connection.sendParam(Client.java:630)
> 	at org.apache.hadoop.ipc.Client.call(Client.java:886)
> 	at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:198)
> 	at $Proxy5.getFileInfo(Unknown Source)
> 	at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> 	at java.lang.reflect.Method.invoke(Method.java:597)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
> 	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
> 	at $Proxy5.getFileInfo(Unknown Source)
> 	at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:850)
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:620)
> 	at org.apache.hadoop.mapred.TaskTracker.localizeJobTokenFile(TaskTracker.java:3984)
> 	at org.apache.hadoop.mapred.TaskTracker.localizeJobFiles(TaskTracker.java:1036)
> 	at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:977)
> 	at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2247)
> 	at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
> 2013-06-22 12:39:42,300 ERROR org.apache.hadoop.mapred.TaskStatus: Trying to set finish
time for task attempt_201208241212_26088_m_000043_1 when no start time is set, stackTrace
is : java.lang.Exception
> 	at org.apache.hadoop.mapred.TaskStatus.setFinishTime(TaskStatus.java:195)
> 	at org.apache.hadoop.mapred.MapTaskStatus.setFinishTime(MapTaskStatus.java:51)
> 	at org.apache.hadoop.mapred.TaskTracker$TaskInProgress.kill(TaskTracker.java:2937)
> 	at org.apache.hadoop.mapred.TaskTracker.startNewTask(TaskTracker.java:2255)
> 	at org.apache.hadoop.mapred.TaskTracker$TaskLauncher.run(TaskTracker.java:2212)
>   
>   Since then, the TaskTracker has not completed any task.we can find the property of
Connection -out is null from the above exeption,and it is caused by the failure of method
setupIOstream() in the class org.apache.ipc.Client$Connection.anyway, the instance of Connection
is not null and cached by the org.apache.ipc.Client.We guess that it throwed a OutOfMemoryError
when a thread called the setupIOstream() because of RPC, so that some properties of the Connection
are null, and throw NullPointerException when the Connection are accessed by other threads,which
is fron the cache.It must be correct to make sure that the instance of Connection could only
be cached after initialized successfully. We fixed the Method getConnection() to guarantee
the semantic:
> private Connection getConnection(InetSocketAddress addr, Class<?> protocol, UserGroupInformation
ticket, Call call) throws IOException, InterruptedException {
>   if (!running.get()) {
>       // the client is stopped
>       throw new IOException("The client is stopped");
>     }
>     Connection connection;
>     ConnectionId remoteId = new ConnectionId(addr, protocol, ticket);
>     do {
>       synchronized (connections) {
>         connection = connections.get(remoteId);
>         if (connection == null) {
>           connection = new Connection(remoteId);
>           
>            connection.setupIOstreams();
>           
>           connections.put(remoteId, connection);
>         }
>       }
>     } while (!connection.addCall(call));
>     
> -   //connection.setupIOstreams();
>     
>     return connection;
>   }
>   On the other hand, we also simulate this scenario.Firstly, one thread create the Connection
instance and throw OutOfMemoryError when it call connection.setupIOstreams().after that, the
other thread start call RPC through by the instance and it keeps throwing the same exceptions

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message