hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HDFS-7018) Implement C interface for libhdfs3
Date Thu, 18 Dec 2014 23:32:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252541#comment-14252541
] 

Colin Patrick McCabe edited comment on HDFS-7018 at 12/18/14 11:31 PM:
-----------------------------------------------------------------------

Please don't make another copy of {{hdfs.h}} in the source tree.  This will lead to the various
copies getting out of sync over time, which would be very bad.  Instead, just reference the
existing copy via a relative path.

You can add your {{hdfsGetLastError}} function to this file, and just have a dummy implementation
that returns "unknown error" in all cases for {{libwebhdfs}} and {{libhdfs}}.  We can improve
this in a follow-on JIRA.

{code}
39	#ifdef __cplusplus
40	extern "C" {
41	#endif
{code}
While this is needed in {{hdfs.h}}, it is not needed in your {{Hdfs.cc}} code.  The C\+\+
linker is smart enough to figure out that the prototypes it is seeing correspond to the prototypes
in the {{hdfs.h}} file you included.

{code}
45	static THREAD_LOCAL const char *ErrorMessage = NULL;
46	static THREAD_LOCAL std::string *ErrorMessageBuffer = NULL;
47	static THREAD_LOCAL hdfs::internal::once_flag once;
48	
49	static void CreateMessageBuffer() {
50	    ErrorMessageBuffer = new std::string;
51	}
{code}
I don't think we need all this.  Making the thread-local buffer a pointer to a {{std::string}}
means that we have to check {{once_flag}} before we access it, which is inefficient.  It also
means that if the thread exits, this memory will be leaked (unless you set up a POSIX "thread
destructor," which is complicated and platform-specific).

Instead, let's just have a char\[128\] buffer for each thread.  As an added bonus, because
this utilitizes pre-allocation, it handles the case where you can't allocate memory for the
error string itself, which you have said in the past that you care about.

{code}
158	private:
159	    bool input;
160	    void *stream;
161	};
{code}
Please don't use {{void*}} here.  It is not typesafe.  

You can clearly see that FS objects and file objects have a concrete type, spelled out in
{{hdfs.h}}:
{code}
    struct hdfs_internal;
    typedef struct hdfs_internal* hdfsFS;

    struct hdfsFile_internal;
    typedef struct hdfsFile_internal* hdfsFile;
{code}

All of these functions need to have a {{catch (...)}} which sets the error message to "unknown"
and returns {{EINTERNAL}}.  The reason is that if you attempt to throw a C\+\+ exception through
a C API, the program will abort (technically, {{std::terminate}} will be called).  I realize
you probably think you have caught all possible exceptions, but since this is C\+\+, we can
never really be sure without the {{catch (...)}}

P.S. thanks for working on this!


was (Author: cmccabe):
Please don't make another copy of {{hdfs.h}} in the source tree.  This will lead to the various
copies getting out of sync over time, which would be very bad.  Instead, just reference the
existing copy via a relative path.

You can add your {{hdfsGetLastError}} function to this file, and just have a dummy implementation
that returns "unknown error" in all cases for {{libwebhdfs}} and {{libhdfs}}.  We can improve
this in a follow-on JIRA.

{code}
39	#ifdef __cplusplus
40	extern "C" {
41	#endif
{code}
While this is needed in {{hdfs.h}}, it is not needed in your {{Hdfs.cc}} code.  The C\+\+
linker is smart enough to figure out that the prototypes it is seeing correspond to the prototypes
in the {{hdfs.h}} file you included.

{code}
45	static THREAD_LOCAL const char *ErrorMessage = NULL;
46	static THREAD_LOCAL std::string *ErrorMessageBuffer = NULL;
47	static THREAD_LOCAL hdfs::internal::once_flag once;
48	
49	static void CreateMessageBuffer() {
50	    ErrorMessageBuffer = new std::string;
51	}
{code}
I don't think we need all this.  Making the thread-local buffer a pointer to a {{std::string}}
means that we have to check {{once_flag}} before we access it, which is inefficient.  It also
means that if the thread exits, this memory will be leaked (unless you set up a POSIX "thread
destructor," which is complicated and platform-specific).

Instead, let's just have a char\[128\] buffer for each thread.  As an added bonus, because
this utilitizes pre-allocation, it handles the case where you can't allocate memory for the
error string itself, which you have said in the past that you care about.

{code}
158	private:
159	    bool input;
160	    void *stream;
161	};
{code}
Please don't use {{void*}} here.  It is not typesafe.  

You can clearly see that FS objects and file objects have a concrete type, spelled out in
{{hdfs.h}}:
{code}
    struct hdfs_internal;
    typedef struct hdfs_internal* hdfsFS;

    struct hdfsFile_internal;
    typedef struct hdfsFile_internal* hdfsFile;
{code}

All of these functions need to have a {{catch (...)}} which sets the error message to "unknown"
and returns {{EINTERNAL}}.  The reason is that if you attempt to throw a C\+\+ exception through
a C API, the program will abort (technically, {{std::terminate}} will be called).  I realize
you probably think you have caught all possible exceptions, but since this is C\+\+, we can
never really be sure without the {{catch (...)}}

> Implement C interface for libhdfs3
> ----------------------------------
>
>                 Key: HDFS-7018
>                 URL: https://issues.apache.org/jira/browse/HDFS-7018
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: hdfs-client
>            Reporter: Zhanwei Wang
>            Assignee: Zhanwei Wang
>         Attachments: HDFS-7018-pnative.002.patch, HDFS-7018.patch
>
>
> Implement C interface for libhdfs3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message