hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5939) WebHdfs returns misleading error code and logs nothing if trying to create a file with no DNs in cluster
Date Wed, 19 Feb 2014 16:06:20 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13905606#comment-13905606
] 

Yongjun Zhang commented on HDFS-5939:
-------------------------------------

Thanks Haohui.

Indeed, the contract of Random.nextInt() expects numOfDatanodes to be greater than 0, otherwise,
it will throw
   IllegalArgumentException("n must be positive");
That's what I listed in the original bug report, and we haven't seen this exception throw
from 
  NetworkTopology.chooseRandom(String scope, String excludedScope)
until HDFS-5939.

Investigation of this bug shows that numOfDatanodes is 0 because no dataNode is running in
this case.

Prior to my fix, there are three cases of how method 
  NetworkTopology.chooseRandom(String scope, String excludedScope)
could finish:
1. return valid Node
2. return null (in the beginning of the method)
3. throw the above exception when calling Random.nextInt() ( in the end of the method).

It seems all callers of this method didn't check for case 2. The result would be, if it happens,
the caller would result in null pointer exception (again, there is no report saying this ever
happened).

HDFS-5939 is case 3 where the caller is NamenodeWebHdfs.redirectURI(..).  My submitted fix
makes chooseRandom method to return null before calling Random.netxInt() when numDatanode
is 0, and throw NoDatanodeException from caller side. Basically my fix replace the InvalidArgumentException
with NoDatanodeException for this case with an explicit message to help user,   

With my submitted fix here, if numOfDatanode==0 happens for other callers of chooseRandom
method in real case, my fix won't really hide the problem. That is, it will result in null
pointer exception, instead of the InvalidArgumentException.  Now this is covered by HDFS-5970.
I hope there is a field report of HDFS-5970 before we fix HDFS-5970 so we can understand why
it happened.

Another alternative to my fix is, to change the interface of NetworkTopology.chooseRandom
exception spec, and to let it throw NodatanodeException instead of InvalidArgumentException.
I didn't do this in my submitted fix for two reasons:
- the caller has better chance to provide a more helpful message.
- the impact of changing the interface in wider.

Would you please let me know what you think? thanks.













> WebHdfs returns misleading error code and logs nothing if trying to create a file with
no DNs in cluster
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-5939
>                 URL: https://issues.apache.org/jira/browse/HDFS-5939
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs-client
>    Affects Versions: 2.3.0
>            Reporter: Yongjun Zhang
>            Assignee: Yongjun Zhang
>         Attachments: HDFS-5939.001.patch
>
>
> When trying to access hdfs via webhdfs, and when datanode is dead, user will see an exception
below without any clue that it's caused by dead datanode:
> $ curl -i -X PUT ".../webhdfs/v1/t1?op=CREATE&user.name=<userName>&overwrite=false"
> ...
> {"RemoteException":{"exception":"IllegalArgumentException","javaClassName":"java.lang.IllegalArgumentException","message":"n
must be positive"}}
> Need to fix the report to give user hint about dead datanode.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Mime
View raw message