subversion-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Troy Curtis <t...@troycurtisjr.com>
Subject Re: [swig-py3][patch] interfacing bytes object instead of str
Date Thu, 27 Dec 2018 04:01:43 GMT
On 12/26/18 11:51 AM, futatuki@poem.co.jp wrote:
> On 12/25/18 12:20, Troy Curtis Jr wrote:
>> On Mon, Dec 24, 2018 at 1:11 PM Yasuhito FUTATSUKI <futatuki@poem.co.jp>
>> wrote:
> 
>> Awesome! I'm really glad you are helping work on this, helps to keep me
>> motivated as well ;)
> 
> I'm glad if I could some help, too :)

I've committed your patch as is at r1849784, so that any of these 
enhancements we've discussed can be smaller, more focused patches.  I 
did tweak some formatting for the log message to better match the 
typical guidelines. Mostly spacing and converting to complete sentences. 
I also converted the symbol references in python code from 
<Class::method> to more the more Pythonic <Class.method> format for 
better grepping.

The only real functionality changes I'm concerned with on the changes 
are dropping str() conversion on a couple of the API inputs since 
previously providing any object with a suitable __str__ method would 
work, but would now be broken. But a simple way of supporting Py2 and 
Py3 isn't a "gimme", so didn't want to block this patch on that.

> 
>> So I wonder, as a guiding principle if we could support either Str or 
>> Bytes
>> as an input? That would be akin to most
>> of the standard module interfaces.  I think that would be fairly easy to
>> support, and would help ensure that the
>> behavior is much closer to the typical expectation. it would also reduce
>> the number of b"" that are necessary :P Of
>> course, in many cases for the standard library interfaces, the input type
>> is used as a hint to indicate what type to
>> return, which is much more difficult/impossible to fully support in these
>> bindings.  That seems like it would provide
>> a way to make using the bindings more familiar, though we'd still have to
>> have the policy of just always returning Bytes
>> objects as output. That input/output disconnect might undo any ease of 
>> use
>> advantage than just always using Bytes
>> ...I'm not sure exactly what my opinion on that is...
> 
> It is just I wanted to confirm, how deal with inputs :)
> 
> I also want that both of Bytes and Str is acceptable input values,
> especially for paths on file system (not Subversion's internal paths).
> So, I'll tweak typemaps for it.
> 
>> The patch inline for comments:
>>
>> diff --git a/build/ac-macros/swig.m4 b/build/ac-macros/swig.m4
>>> index d865689de5..c797ae69d8 100644
>>> --- a/build/ac-macros/swig.m4
>>> +++ b/build/ac-macros/swig.m4
>>> @@ -154,7 +154,7 @@ AC_DEFUN(SVN_FIND_SWIG,
>>>             ])
>>>
>>>             if test "$ac_cv_python_is_py3" = "yes"; then
>>> -             SWIG_PY_OPTS="-python -py3"
>>> +             SWIG_PY_OPTS="-python -py3 -DPY3"
>>>             else
>>>                SWIG_PY_OPTS="-python -classic"
>>>             fi
>>>
>>
>> We shouldn't define a new preprocessor value here, but rather make use of
>> the existing IS_PY3 value defined by py3c.
> 
> It was really needed because IS_PY3 macro symbol is not recognized to
> SWIG, because it is C preprocessers macro, not SWIG macro. I've looked
> for distinguish py2 and py3 on SWIG context, but I couldn't find suitable
> macro. (however this is no more needed)
> 
>> diff --git a/subversion/bindings/swig/core.i
>>> b/subversion/bindings/swig/core.i
>>> index e5a234c5e2..c309d48070 100644
>>> --- a/subversion/bindings/swig/core.i
>>> +++ b/subversion/bindings/swig/core.i
>>>
>>
>> [snip]
>>
>> @@ -442,16 +442,18 @@
>>>   */
>>>   #ifdef SWIGPYTHON
>>>   %typemap(in) (const char *data, apr_size_t *len) ($*2_type temp) {
>>> -    if (!PyStr_Check($input)) {
>>> +    char *tmpdata;
>>> +    Py_ssize_t length;
>>> +    if (!PyBytes_Check($input)) {
>>>           PyErr_SetString(PyExc_TypeError,
>>> -                        "expecting a string for the buffer");
>>> +                        "expecting a bytes object for
the buffer");
>>>           SWIG_fail;
>>>       }
>>> -    /* Explicitly cast away const from PyStr_AsUTF8AndSize (in 
>>> Python 3).
>>> The
>>> -       swig generated argument ($1) here will be "char *", but since 
>>> its
>>> only
>>> -       use is to pass directly into the "const char *" argument of the
>>> wrapped
>>> -       function, this is safe to do. */
>>> -    $1 = (char *)PyStr_AsUTF8AndSize($input, &temp);
>>>
>>
>>
>>> +    if (PyBytes_AsStringAndSize($input, &tmpdata, &length) == -1)
{
>>>
>>
>> Why not just use "&$1" here instead of introducing a new temp variable?
> 
> It's to avoid C compiler warning to discard 'const' quarifier, but this
> can be also safely accomplished by cast.
> 
>> +        SWIG_fail;
>>> +    }
>>> +    temp = ($*2_type)length;
>>> +    $1 = tmpdata;
>>>       $2 = ($2_ltype)&temp;
>>>   }
>>>   #endif
>>>
>> @@ -504,8 +506,8 @@
>>>          SWIG_fail;
>>>       }
>>>
>>> -    if (PyStr_Check($input)) {
>>> -        const char *value = PyStr_AsString($input);
>>> +    if (PyBytes_Check($input)) {
>>> +        const char *value = PyBytes_AsString($input);
>>>           $1 = apr_pstrdup(_global_pool, value);
>>>       }
>>>       else if (PyLong_Check($input)) {
>>>
>>
>> Since this function does a whole list of checks and conversions, 
>> perhaps it
>> makes sense to support Str here in
>> addition to bytes? Of course in Py2 the "PyStr_Check()" would effectively
>> be performed twice if we did that, but
>> I don't expect that to be a tremendous impact, and with the "cost" of an
>> addition "IS_PY3" preprocessor check,
>> that could be eliminated from the Py2 build.
> 
> I Agreed.
> 
>>> diff --git a/subversion/bindings/swig/include/svn_global.swg
>>> b/subversion/bindings/swig/include/svn_global.swg
>>> index 191bbdd531..f7364be28f 100644
>>> --- a/subversion/bindings/swig/include/svn_global.swg
>>> +++ b/subversion/bindings/swig/include/svn_global.swg
>>> @@ -31,6 +31,12 @@
>>>   #define SVN_DEPRECATED
>>>   #endif
>>>
>>> +#ifdef SWIGPYTHON
>>> +%begin %{
>>> +#define SWIG_PYTHON_STRICT_BYTE_CHAR
>>> +%}
>>> +#endif
>>> +
>>>   %include typemaps.i
>>>   %include constraints.i
>>>   %include exception.i
>>> @@ -136,9 +142,15 @@ static PyObject * _global_py_pool = NULL;
>>>   /* Python format specifiers. Use Python instead of SWIG to parse
>>>      these basic types, because Python reports better error messages
>>>      (with correct argument numbers). */
>>> +#if defined(PY3)
>>>
>>
>> This is where we'd want to use py3c's "IS_PY3" value.
> 
> including header file py3c/compath.h is done by indirectly as
> 
> #ifdef SWIGPYTHON
> %{
> #include "swigutil_py.h"
> #include "swigutil_py3c.h"
> %}
> #endif
> 
> however, it is enclosed by %{ %} so it is not defined as SWIG macro
> symbol. and following
> 
>>> +%typemap (in, parse="y")
>>> +  char *, char const *, char * const, char const * const "";
>>> +#else
>>>   %typemap (in, parse="s")
>>>     char *, char const *, char * const, char const * const "";
>>> +#endif
> 
> is a SWIG context switch. this is why -DPY3 was needed.
> However, if we want to allow both of Bytes and Str for input, `parse'
> directive cannot be used here and must write code here to type check
> and to convert ourself, it isn't needed.

Ah, I see your point!

> 
> Rest of all your suggestions convinced me, so I'll start to modify. >
> Cheers,

Thanks!
Troy

Mime
View raw message