thrift-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jarry Shaw (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (THRIFT-4677) UnicodeDecodeError in Python3
Date Sat, 10 Aug 2019 09:06:00 GMT

    [ https://issues.apache.org/jira/browse/THRIFT-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16904382#comment-16904382
] 

Jarry Shaw commented on THRIFT-4677:
------------------------------------

Sorry for the late reply. It was quite a long time ago, and I just tried to reproduce the
bug recently.

So here's the exception traceback stack:

{code:python}
Traceback (most recent call last):
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py",
line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\multiprocessing\pool.py",
line 44, in mapstar
    return list(map(*args))
  File "C:\Users\fakepath\Desktop\osquery_all_mp.py", line 54, in query
    query = instance.client.query(f'SELECT * FROM {table};')
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\ExtensionManager.py",
line 182, in query
    return self.recv_query()
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\ExtensionManager.py",
line 201, in recv_query
    result.read(iprot)
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\ExtensionManager.py",
line 981, in read
    self.success.read(iprot)
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\osquery\extensions\ttypes.py",
line 339, in read
    _val12 = iprot.readString().decode('utf-8') if sys.version_info[0] == 2 else iprot.readString()
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\thrift\protocol\TProtocol.py",
line 184, in readString
    return binary_to_str(self.readBinary())
  File "C:\Users\fakepath\AppData\Local\Programs\Python\Python37\lib\site-packages\thrift\compat.py",
line 37, in binary_to_str
    return bin_val.decode('utf8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc8 in position 0: invalid continuation
byte
{code}

Environment:

* Windows 10 Pro (Simplified Chinese)
* osquery v3.3.0
* osquery-python v3.0.6 (Python binding)
* thrift v0.11.0

And the Python system locale information:

{code:python}
>>> locale.getpreferredencoding()
'cp936'
{code}

Sorry I'm not familiar Thrift's implementation, so not really know how should this bug be
fixed.
However, you may find the source code I'm using in the attachment.

 [^osquery_all_mp.py] 

> UnicodeDecodeError in Python3
> -----------------------------
>
>                 Key: THRIFT-4677
>                 URL: https://issues.apache.org/jira/browse/THRIFT-4677
>             Project: Thrift
>          Issue Type: Bug
>          Components: Python - Library
>         Environment: Operating System: Windows 10 Pro (Simplified Chinese)
> Python Interpreter: Python 3.6.6
> {{osquery}} Version: 3.3.0
> {{osquery-python}} Version: 3.0.5
>  
>            Reporter: Jarry Shaw
>            Priority: Major
>         Attachments: compat.py, osquery_all_mp.py
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> This is an issue occurred when using [osquery-python|https://github.com/osquery/osquery-python]
(Python binding of [osquery|https://osquery.io/] by Facebook).
> When querying, {{UnicodeDecodeError}} raised with error message: "{{'utf-8' codec can't
decode byte 0xc3 in position 0: invalid continuation byte}}" from {{thrift.compat.binary_to_str}},
which is because the encoding of {{bin_val}} parameter should be "{{gbk}}".
> Possible approaches are:
>  * add a parameter for user to determine encodings
>  * get the system encoding through {{locale.getpreferredencoding()}}
>  * call {{bin_val.decode}} with {{errors='replace'}} or {{errors='ignore'}} parameter
>  * introduce {{chardet}} to try and resolve encoding problems
> The attachment is my hack solution to this issue (through not perfect).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Mime
View raw message