hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sailesh Mukil (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11529) libHDFS still does not return appropriate error information in many cases
Date Tue, 28 Mar 2017 23:27:41 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15946190#comment-15946190

Sailesh Mukil commented on HDFS-11529:

Nice improvement.
Thanks for the review [~cmccabe]!

printExceptionAndFreeV: This function is intended to print exceptions and then free them.
If you are overloading it to set thread-local data, you should change the name to reflect
that. Something like handleExceptionAndFree would work. You also need to document this information
in the function doxygen, found in exception.h.
Done. I've renamed all the printException*() functions to handleException*().

It seems to me that the thread-local exception should be set regardless of whether noPrint
is true or not. noPrint was intended to avoid spammy logging for things we expected to happen,
but not to skip setting the error return. The thread-local storage is essentially an out-of-band
way of returning more error data, so I don't see why it should be affected by noPrint.
Yes, you're right. Done.

You need to document what a NULL return means here.

getJNIEnv should free and zero out these thread-local pointers. Otherwise the exception text
from one call may bleed into another, since there are still some code paths that don't set
the thread-local error status.
Yes, I've added code to do that now.

It is not related to your patch, but I just noticed that hdfsGetHosts doesn't set errno on
failure. Do you mind fixing that?

> libHDFS still does not return appropriate error information in many cases
> -------------------------------------------------------------------------
>                 Key: HDFS-11529
>                 URL: https://issues.apache.org/jira/browse/HDFS-11529
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: libhdfs
>    Affects Versions: 2.6.0
>            Reporter: Sailesh Mukil
>            Assignee: Sailesh Mukil
>            Priority: Critical
>              Labels: errorhandling, libhdfs
>         Attachments: HDFS-11529.000.patch, HDFS-11529.001.patch, HDFS-11529.002.patch
> libHDFS uses a table to compare exceptions against and returns a corresponding error
code to the application in case of an error.
> However, this table is manually populated and many times is disremembered when new exceptions
are added.
> This causes libHDFS to return EINTERNAL (or Unknown Error(255)) whenever these exceptions
are hit. These are some examples of exceptions that have been observed on an Error(255):
> org.apache.hadoop.ipc.StandbyException (Operation category WRITE is not supported in
state standby)
> java.io.EOFException: Cannot seek after EOF
> javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid
credentials provided (Mechanism level: Failed to find any Kerberos tgt)
> It is of course not possible to have an error code for each and every type of exception,
so one suggestion of how this can be addressed is by having a call such as hdfsGetLastException()
that would return the last exception that a libHDFS thread encountered. This way, an application
may choose to call hdfsGetLastException() if it receives EINTERNAL.
> We can make use of the Thread Local Storage to store this information. Also, this makes
sure that the current functionality is preserved.
> This is a follow up from HDFS-4997.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org

View raw message