thrift-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarry Shaw (JIRA)" <>
Subject [jira] [Commented] (THRIFT-4677) UnicodeDecodeError in Python3
Date Sat, 10 Aug 2019 09:06:00 GMT


Jarry Shaw commented on THRIFT-4677:

Sorry for the late reply. It was quite a long time ago, and I just tried to reproduce the
bug recently.

So here's the exception traceback stack:

Traceback (most recent call last):
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\multiprocessing\",
line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\multiprocessing\",
line 44, in mapstar
    return list(map(*args))
  File "C:\Users\fakepath\Desktop\", line 54, in query
    query = instance.client.query(f'SELECT * FROM {table};')
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\",
line 182, in query
    return self.recv_query()
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\",
line 201, in recv_query
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\",
line 981, in read
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\",
line 339, in read
    _val12 = iprot.readString().decode('utf-8') if sys.version_info[0] == 2 else iprot.readString()
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\thrift\protocol\",
line 184, in readString
    return binary_to_str(self.readBinary())
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\thrift\",
line 37, in binary_to_str
    return bin_val.decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0: invalid continuation


* Windows 10 Pro (Simplified Chinese)
* osquery v3.3.0
* osquery-python v3.0.6 (Python binding)
* thrift v0.11.0

And the Python system locale information:

>>> locale.getpreferredencoding()

Sorry I'm not familiar Thrift's implementation, so not really know how should this bug be
However, you may find the source code I'm using in the attachment.


> UnicodeDecodeError in Python3
> -----------------------------
>                 Key: THRIFT-4677
>                 URL:
>             Project: Thrift
>          Issue Type: Bug
>          Components: Python - Library
>         Environment: Operating System: Windows 10 Pro (Simplified Chinese)
> Python Interpreter: Python 3.6.6
> {{osquery}} Version: 3.3.0
> {{osquery-python}} Version: 3.0.5
>            Reporter: Jarry Shaw
>            Priority: Major
>         Attachments:,
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
> This is an issue occurred when using [osquery-python|]
(Python binding of [osquery|] by Facebook).
> When querying, {{UnicodeDecodeError}} raised with error message: "{{'utf-8' codec can't
decode byte 0xc3 in position 0: invalid continuation byte}}" from {{thrift.compat.binary_to_str}},
which is because the encoding of {{bin_val}} parameter should be "{{gbk}}".
> Possible approaches are:
>  * add a parameter for user to determine encodings
>  * get the system encoding through {{locale.getpreferredencoding()}}
>  * call {{bin_val.decode}} with {{errors='replace'}} or {{errors='ignore'}} parameter
>  * introduce {{chardet}} to try and resolve encoding problems
> The attachment is my hack solution to this issue (through not perfect).

This message was sent by Atlassian JIRA

View raw message