hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From malcolm <malcolm.kaval...@oracle.com>
Subject Re: Solaris Port
Date Sat, 13 Dec 2014 09:29:27 GMT
Colin,

I am not sure what you mean by a thread-local buffer (in native code). 
In Java this is pretty standard, but I couldn't find any implementation 
for C code.

Here is the terror function:

     const char* terror(int errnum)
     {
       if ((errnum < 0) || (errnum >= sys_nerr)) {
         return "unknown error.";
       }
       return sys_errlist[errnum];
     }


The interface is identical to strerror, but the implementation is 
actually re-entrant since it returns a pointer to a static string.

If I understand your suggestion, the new function would look like this:

    const char* terror(int errnum)
    {
       static char result[65];

       strerror_r(errnum, result, 64);

       return result;
    }

No need for snprintf, strerror_r  has the 'n' bounding built-in.

Of course, this is still non-re-entrant, so unless the caller copies the 
returned buffer, before the function is called again, there is a problem.

After considerable thought, I have come up with this version of terror, 
tested OK on Windows, Linux and Solaris:

    #if defined(_WIN32)
    #define strerror_r(errno,buf,len) strerror_s(buf,len,errno)
    #endif

    #define MAX_ERRORS 256
    #define MAX_ERROR_LEN 80

    char *terror(int errnum)
    {

       static char errlist[MAX_ERRORS][MAX_ERROR_LEN+1]; // cache of
    error messages

       if ( errnum >= 0 && errnum < MAX_ERRORS )
         {
           if ( errlist[errnum][0] == 0 )
             strerror_r( errnum, errlist[errnum], MAX_ERROR_LEN);

           return errlist[errnum];
         }
       else
         {
           return "Unknown error";
         }
    }

This version is portable and re-entrant.

On windows, the largest errnum is 43, on Ubuntu 14.04 we have 133, and 
on Solaris 11.1 we get 151.

If this is OK with you, I will open a jira for this.


Thanks,
Malcolm


On 12/12/2014 11:10 PM, Colin McCabe wrote:
> Just use snprintf to copy the error message from strerror_r into a
> thread-local buffer of 64 bytes or so.  Then preserve the existing
> terror() interface.
>
> Can you open a jira for this?
>
> best,
> Colin
>
> On Thu, Dec 11, 2014 at 8:35 PM, malcolm<malcolm.kavalsky@oracle.com>  wrote:
>> So, turns out that if I had naively changed all calls to terror or
>> references to sys_errlist, to using strerror_r, then I would have broken
>> code for Windows and HPUX (and possibly other OSes).
>>
>> If we are to assume that current code runs fine on all platforms (maybe even
>> AIX an MacOS, for example), then any change/additions made to the code and
>> not ifdeffed appropriately can break on other OSes. On the other hand,  too
>> many ifdefs can pollute the code source and render it less readable (though
>> possibly less important).
>>
>> In the general case what are code contributors responsibilities to adding
>> code regarding OSes besides Linux ?
>> What OSes does jenkins test on ?
>> I guess maintainers of code on non-tested platforms are responsible for
>> their own testing ?
>>
>> How do we avoid the ping-pong effect, i.e. I make a generic change to code
>> which breaks on Windows, then the Windows maintainer reverts changes to
>> break on Solaris for example ? Or does this not happen in actuality ?
>>
>>
>> On 12/11/2014 11:25 PM, Asokan, M wrote:
>>> Hi Malcom,
>>>     The Windows versions of strerror() and strerror_s() functions are
>>> probably meant for ANSI C library functions that set errno.  For core
>>> Windows API calls (like UNIX system calls), one gets the error number by
>>> calling GetLastError() function.  In the code snippet I sent earlier, the
>>> "code" argument is the value returned by GetLastError().  Neither strerror()
>>> nor strerror_s() will give the correct error message for this error code.
>>>
>>> You could probably look at libwinutils.c in Hadoop source.  It uses
>>> FormatMessageW (which returns messages in Unicode.)  My requirement was to
>>> return messages in current system locale.
>>>
>>> -- Asokan
>>> ________________________________________
>>> From: malcolm [malcolm.kavalsky@oracle.com]
>>> Sent: Thursday, December 11, 2014 4:04 PM
>>> To:common-dev@hadoop.apache.org
>>> Subject: Re: Solaris Port
>>>
>>> Hi Asok,
>>>
>>> I googled and found that windows has strerror, and strerror_s (which is
>>> the strerror_r equivalent).
>>> Is there a reason why you didn't use this call ?
>>>
>>> On 12/11/2014 06:27 PM, Asokan, M wrote:
>>>> Hi Malcom,
>>>>       Recently, I had to work on a function to get system error message on
>>>> various systems.  Here is the piece of code I came up with.  Hope it helps.
>>>>
>>>> static void get_system_error_message(char * buf, int buf_len, int code)
>>>> {
>>>> #if defined(_WIN32)
>>>>        LPVOID lpMsgBuf;
>>>>        DWORD status = FormatMessage(FORMAT_MESSAGE_ALLOCATE_BUFFER |
>>>>                                     FORMAT_MESSAGE_FROM_SYSTEM |
>>>>                                     FORMAT_MESSAGE_IGNORE_INSERTS,
>>>>                                     NULL, code,
>>>>                                     MAKELANGID(LANG_NEUTRAL,
>>>> SUBLANG_DEFAULT),
>>>>                                                             /* Default
>>>> language */
>>>>                                     (LPTSTR) &lpMsgBuf, 0, NULL);
>>>>        if (status > 0)
>>>>        {
>>>>            strncpy(buf, (char *)lpMsgBuf, buf_len-1);
>>>>            buf[buf_len-1] = '\0';
>>>>            /* Free the buffer returned by system */
>>>>            LocalFree(lpMsgBuf);
>>>>        }
>>>>        else
>>>>        {
>>>>            _snprintf(buf, buf_len-1 , "%s %d",
>>>>                "Can't get system error message for code", code);
>>>>            buf[buf_len-1] = '\0';
>>>>        }
>>>> #else
>>>> #if defined(_HPUX_SOURCE)
>>>>        {
>>>>            char * msg;
>>>>            errno = 0;
>>>>            msg = strerror(code);
>>>>            if (errno == 0)
>>>>            {
>>>>                strncpy(buf, msg, buf_len-1);
>>>>                buf[buf_len-1] = '\0';
>>>>            }
>>>>            else
>>>>            {
>>>>                snprintf(buf, buf_len, "%s %d",
>>>>                    "Can't get system error message for code", code);
>>>>            }
>>>>        }
>>>> #else
>>>>        if (strerror_r(code, buf, buf_len) != 0)
>>>>        {
>>>>            snprintf(buf, buf_len, "%s %d",
>>>>                "Can't get system error message for code", code);
>>>>        }
>>>> #endif
>>>> #endif
>>>> }
>>>>
>>>> Note that HPUX does not have strerror_r() since strerror() itself is
>>>> thread-safe.  Also Windows does not have snprintf().  The equivalent
>>>> function _snprintf() has a subtle difference in its interface.
>>>>
>>>> -- Asokan
>>>> ________________________________________
>>>> From: malcolm [malcolm.kavalsky@oracle.com]
>>>> Sent: Thursday, December 11, 2014 11:02 AM
>>>> To:common-dev@hadoop.apache.org
>>>> Subject: Re: Solaris Port
>>>>
>>>> Fine with me, I volunteer to do this, if accepted.
>>>>
>>>> On 12/11/2014 05:48 PM, Allen Wittenauer wrote:
>>>>> sys_errlist was removed for a reason.  Creating a fake sys_errlist on
>>>>> Solaris will mean the libhadoop.so will need to be tied a specific build
>>>>> (kernel/include pairing) and therefore limits upward mobility/compatibility.
>>>>> That doesn’t seem like a very good idea.
>>>>>
>>>>> IMO, switching to strerror_r is much preferred, since other than the
>>>>> brain-dead GNU libc version, is highly portable and should work regardless
>>>>> of the kernel or OS in place.
>>>>>
>>>>> On Dec 11, 2014, at 5:20 AM, malcolm<malcolm.kavalsky@oracle.com>
>>>>> wrote:
>>>>>
>>>>>> FYI, there are a couple more files that reference sys_errlist directly
>>>>>> (not just terror within exception.c) , but also hdfs_http_client.c
and
>>>>>> NativeiO.c
>>>>>>
>>>>>> On 12/11/2014 07:38 AM, malcolm wrote:
>>>>>>> Hi Colin,
>>>>>>>
>>>>>>> Exactly, as you noticed, the problem is the thread-local buffer
needed
>>>>>>> to return from terror.
>>>>>>> Currently, terror just returns a static string from an array,
this is
>>>>>>> fast, simple and error-proof.
>>>>>>>
>>>>>>> In order to use strerror_r inside terror,  would require allocating
a
>>>>>>> buffer inside terror  and depend on the caller to free the buffer
after
>>>>>>> using it, or to pass a buffer to terrror (which is basically
the same as
>>>>>>> strerror_r, rendering terror redundant).
>>>>>>> Both cases require modification outside terror itself, as far
as I can
>>>>>>> tell, no simple fix. Unless you have an alternative which I haven't
thought
>>>>>>> of ?
>>>>>>>
>>>>>>> As far as I can tell, we have two choices:
>>>>>>>
>>>>>>> 1. Remove terror and replace calls with strerror_r, passing a
buffer
>>>>>>> from the callee.
>>>>>>>        Advantage: a more modern portable interface.
>>>>>>>        Disadvantage: All calls to terror need to be modified,
though
>>>>>>> all seem to be in a few files as far as I can tell.
>>>>>>>
>>>>>>> 2. Adding a sys_errlist array (ifdeffed for Solaris)
>>>>>>>        Advantage: no change to any calls to terror
>>>>>>>        Disadvantage: 2 additional files added to source tree
(.c and
>>>>>>> .h) and some minor ifdefs only used for Solaris.
>>>>>>>
>>>>>>> I think it is more a question of style than anything else, so
I leave
>>>>>>> you to make the call.
>>>>>>>
>>>>>>> Thanks for your patience,
>>>>>>> Malcolm
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On 12/10/2014 09:54 PM, Colin McCabe wrote:
>>>>>>>> On Wed, Dec 10, 2014 at 2:31 AM, malcolm
>>>>>>>> <malcolm.kavalsky@oracle.com>  wrote:
>>>>>>>>> Hi Colin,
>>>>>>>>>
>>>>>>>>> Thanks for the hints around JIRAs.
>>>>>>>>>
>>>>>>>>> You are correct errno still exists, however sys_errlist
does not.
>>>>>>>>>
>>>>>>>>> Hadoop uses a function terror (defined in exception.c)
which indexes
>>>>>>>>> sys_errlist by errno to return the error message from
the array.
>>>>>>>>> This
>>>>>>>>> function is called 26 times in various places (in 2.2)
>>>>>>>>>
>>>>>>>>> Originally, I thought to replace all calls to terror
with strerror,
>>>>>>>>> but
>>>>>>>>> there can be issues with multi-threading (it returns
a buffer which
>>>>>>>>> can be
>>>>>>>>> overwritten), so it seemed simpler just to recreate the
sys_errlist
>>>>>>>>> message
>>>>>>>>> array.
>>>>>>>>>
>>>>>>>>> There is also a multi-threaded version strerror_r where
you pass the
>>>>>>>>> buffer
>>>>>>>>> as a parameter, but this would necessitate changing every
call to
>>>>>>>>> terror
>>>>>>>>> with mutiple lines of code.
>>>>>>>> Why don't you just use strerror_r inside terror()?
>>>>>>>>
>>>>>>>> I wrote that code originally.  The reason I didn't want to
use
>>>>>>>> strerror_r there is because GNU libc provides a non-POSIX
definition
>>>>>>>> of strerror_r, and forcing it to use the POSIX one is a pain.
But you
>>>>>>>> can do it.  You also will require a thread-local buffer to
hold the
>>>>>>>> return from strerror_r, since it is not guaranteed to be
static
>>>>>>>> (although in practice it is 99% of the time-- another annoyance
with
>>>>>>>> the API).
>>>>>>>>
>>>>>>>>
>>>> ________________________________
>>>>
>>>>
>>>>
>>>> ATTENTION: -----
>>>>
>>>> The information contained in this message (including any files
>>>> transmitted with this message) may contain proprietary, trade secret or
>>>> other confidential and/or legally privileged information. Any pricing
>>>> information contained in this message or in any files transmitted with this
>>>> message is always confidential and cannot be shared with any third parties
>>>> without prior written approval from Syncsort. This message is intended to
be
>>>> read only by the individual or entity to whom it is addressed or by their
>>>> designee. If the reader of this message is not the intended recipient, you
>>>> are on notice that any use, disclosure, copying or distribution of this
>>>> message, in any form, is strictly prohibited. If you have received this
>>>> message in error, please immediately notify the sender and/or Syncsort and
>>>> destroy all copies of this message in your possession, custody or control.


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message