hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-3649) ArrayIndexOutOfBounds in FSNamesystem.getBlockLocationsInternal
Date Fri, 27 Jun 2008 02:00:45 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-3649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12608638#action_12608638
] 

Konstantin Shvachko commented on HADOOP-3649:
---------------------------------------------

1. Looks like there is bug in removing corrupted blocks from the corrupted block map.
We do not remove corrupted replicas until the valid replicas are fully re-replicated on other
nodes.
When they do the corrupted replicas can and should be removed from the data-nodes.
So FSNamesystem.addStoredBlock() actually checks whether there is enough healthy replicas
and invalidates corrupted replicas by:
- removing corrupted locations from the block's list of locations, and
- calling CorruptReplicasMap.removeFromCorruptReplicasMap(), which is supposed to remove it
from the set of corrupted.

But removeFromCorruptReplicasMap() has a condition under which it removes the block from the
corruptReplicasMap
only if the block does not belong to the main blocksMap.
This particularly means that once in the corruptReplicasMap the block stays there until the
file is removed.
The ArrayIndexOutOfBoundsException comes from getBlockLocations(), which assumes that the
set
of corrupted replicas is always a subset of all block replicas. Due to the bug in removeFromCorruptReplicasMap()
it is not the case because corrupt replicas are not in the block's location list, but are
still in the corruptReplicasMap.

2. In CorruptReplicasMap.invalidateCorruptReplicas() I see boolean variable "gotException"
which is set to false
and never changes. I think there was an intention to set it to true in the catch{} section.
But may the right thing to do is just to remove the variable and the call of removeFromCorruptReplicasMap()
from this
method because removeFromCorruptReplicasMap() will be called within fsNamesystem.invalidateBlock()
if
it is successful.

Promoting this to a blocker for 0.18

> ArrayIndexOutOfBounds in FSNamesystem.getBlockLocationsInternal
> ---------------------------------------------------------------
>
>                 Key: HADOOP-3649
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3649
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.18.0
>            Reporter: Arun C Murthy
>             Fix For: 0.18.0
>
>
> A job-submission failed with:
> {noformat}
> org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.lang.ArrayIndexOutOfBoundsException:
2
>   at org.apache.hadoop.dfs.FSNamesystem.getBlockLocationsInternal(FSNamesystem.java:772)
>   at org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:709)
>   at org.apache.hadoop.dfs.FSNamesystem.getBlockLocations(FSNamesystem.java:685)
>   at org.apache.hadoop.dfs.NameNode.getBlockLocations(NameNode.java:257)
>   at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:452)
>   at org.apache.hadoop.ipc.Server$Handler.run(Server.java:888)
>   at org.apache.hadoop.ipc.Client.call(Client.java:707)
>   at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:216)
>   at org.apache.hadoop.dfs.$Proxy0.getBlockLocations(Unknown Source)
>   at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>   at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>   at org.apache.hadoop.dfs.$Proxy0.getBlockLocations(Unknown Source)
>   at org.apache.hadoop.dfs.DFSClient.callGetBlockLocations(DFSClient.java:299)
>   at org.apache.hadoop.dfs.DFSClient.getBlockLocations(DFSClient.java:320)
>   at org.apache.hadoop.dfs.DistributedFileSystem.getFileBlockLocations(DistributedFileSystem.java:122)
>   at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:241)
>   at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:686)
>   at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:966)
>   at org.apache.hadoop.mapred.SortValidator$RecordStatsChecker.checkRecords(SortValidator.java:360)
>   at org.apache.hadoop.mapred.SortValidator.run(SortValidator.java:559)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>   at org.apache.hadoop.mapred.SortValidator.main(SortValidator.java:574)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
>   at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
>   at org.apache.hadoop.test.AllTestDriver.main(AllTestDriver.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>   at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>   at java.lang.reflect.Method.invoke(Method.java:597)
>   at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message